Getting Started --------------- This is a basic overview of how to use the dsgrid-legacy-efs-api ``dsgrid`` package. If desired, more extensive examples can be found throughout the `notebooks `__ and `tests `__. For accessing the dsgrid EFS data, `reading data files <#reading-in-an-existing-data-file>`__ and `working with dsgrid models <#working-with-a-dsgrid-model-collection-of-data-files>`__ is probably most of interest. However, the `brief primer on how to create new data files <#creating-a-new-data-file>`__ may be useful background information. Also note that while a ``.dsg`` extension is used for the dsgrid EFS data files, the underlying format is basic HDF5 and can be browsed with a basic viewer like `HDFView `__. Installation ~~~~~~~~~~~~ To get the basic package, run: :: pip install dsgrid-legacy-efs-api If you would like to run the example notebooks and browse the files available through the Open Energy Data Initiative (OEDI), install the required extra dependencies: :: pip install dsgrid-legacy-efs-api[ntbks,oedi] and also clone the repository. Then you should be able to run the .ipynb files in the dsgrid-legacy-efs-api/notebooks folder, which include functionality for directly browsing the OEDI `oedi-data-lake/dsgrid-2018-efs `__ data files. If you would like to use the HSDS service, please see the configuration instructions at `https://github.com/NREL/hsds-examples/ `__. Creating a new data file ~~~~~~~~~~~~~~~~~~~~~~~~ To begin, create an empty :class:`~dsgrid.dataformat.datafile.Datafile` object. This involves providing a file path for the HDF5 file that will be created, and a set of valid :class:`sector `, :class:`geography `, :class:`enduse `, and :class:`time ` :class:`enumerations `. An :class:`~dsgrid.dataformat.enumeration.Enumeration` includes both a list of unique IDs identifying individual allowed values, as well as a matching list of more descriptive names. The package includes predefined :class:`Enumerations ` for sector model data. .. code:: python from dsgrid.datafile import Datafile from dsgrid.enumeration import ( sectors_subsectors, counties, enduses, hourly2012 ) f = Datafile("data.dsg", sectors_subsectors, counties, enduses, hourly2012) A :class:`~dsgrid.dataformat.sectordataset.SectorDataset` can now be added to the :class:`~dsgrid.dataformat.datafile.Datafile`. Note that here “sector” refers to both levels of the sector/subsector hierarchy. This is for extensibility of the format to support less resolved datasets where data may only be available by aggregate sector, or even just economy-wide. The following would create a sector dataset that spans all enduses and time periods, assuming the provided sector ID exists in ``f``\ ’s :class:`~dsgrid.dataformat.enumeration.SectorEnumeration`: .. code:: python f.add_sector("res__SingleFamilyDetached") However, it’s likely that a single sector/subsector will not be drawing load for all possible end uses. In that case, to save space on disk, the sector can be defined to use only a subset of the end-uses listed in the :class:`Datafile's ` :class:`~dsgrid.dataformat.enumeration.EndUseEnumerationBase` ID list: .. code:: python singlefamilydetached = f.add_sector("res__SingleFamilyDetached", enduses=["heating", "cooling", "interior_lights"]) One could restrict the dataset to a subset of times in a similar fashion. Simulation data can now be assigned to the sector (subsector). The data should be in the form of a Pandas DataFrame with rows indices corresponding to IDs in the :class:`Datafile's ` :class:`~dsgrid.dataformat.enumeration.TimeEnumeration` and column names corresponding to enduse IDs in the :class:`Datafile's ` :class:`EndUseEnumeration ` (or the predetermined subset discussed immediately above). Each DataFrame is assigned to at least one geography, which are represented by IDs in the :class:`Datafile's ` :class:`~dsgrid.dataformat.enumeration.GeographyEnumeration`. In this case, ``"08059"`` is the ID and FIPS code for Jefferson County, Colorado: .. code:: python singlefamilydetached["08059"] = jeffco_sfd_data singlefamilydetached[["08001", "08003", "08005"]] = same_sfd_data_in_many_counties Individual geographies can be associated with a scaling factor to be applied to their corresponding data, although this feature is not accessible through the indexed assignment syntax and instead requires a method call. This is most useful when load shapes are shared between counties but magnitudes differ: .. code:: python singlefamilydetached.add_data(same_sfd_shape_different_magnitudes, ["01001", "01003", "01005"], [1.1, 2.3, 6.7]) All data is persisted to disk (not stored in memory) as soon as it is assigned, so after adding data no further steps are required to save out the file. Additional classes and methods useful for creating new data: - :class:`~dsgrid.dataformat.enumeration.SingleFuelEndUseEnumeration` - :class:`~dsgrid.dataformat.enumeration.FuelEnumeration` - :class:`~dsgrid.dataformat.enumeration.MultiFuelEndUseEnumeration` - :meth:`~dsgrid.dataformat.sectordataset.SectorDataset.add_data_batch` Reading in an existing data file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If a dsgrid-formatted HDF5 file already exists, it can be read into a :class:`~dsgrid.dataformat.datafile.Datafile` object: .. code:: python f2 = Datafile.load("data.dsg") All of the data will then be accessible to Python just as it was when the file was first created, for example: .. code:: python sfd = f2["res__SingleFamilyDetached"] jeffco_sfd = sfd["08059"] For easier data manipulation, the full contents of the :class:`~dsgrid.dataformat.datafile.Datafile` can also be read into memory in a tabular format by creating a :class:`~dsgrid.dataformat.datatable.Datatable` object: .. code:: python from dsgrid.dataformat.datatable import Datatable dt = Datatable(f2) A :class:`~dsgrid.dataformat.datatable.Datatable` is just a thin wrapper around a Pandas ``Series`` with a four-level ``MultiIndex``. The :class:`~dsgrid.dataformat.datatable.Datatable` can be indexed into for quick access to a relevant subset of the data, or the underlying ``Series`` can be accessed and manipulated directly. .. code:: python # Accessing a single value dt["res__SingleFamilyDetached", "08059", "heating", "2012-04-28 02:00:00-05:00"] # Accessing a Series slice dt["res__SingleFamilyDetached", "08059", "heating", :] # Working directly with the underlying Series sector_enduse_totals = dt.data.groupby(levels=["sector", "enduse"]).sum() Additional methods useful for accessing data: - :meth:`dsgrid.dataformat.sectordataset.SectorDataset.get_data` Working with a dsgrid model (collection of data files) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A :class:`dsgrid.model.LoadModel` holds a collection of related datafiles and tag each one with its :class:`~dsgrid.model.ComponentType` and an optional color (for plotting). For example, a :class:`~dsgrid.model.LoadModel` can be formed just from the ComponentType.BOTTOMUP components: .. code:: python from dsgrid.model import ComponentType, LoadModelComponent, LoadModel bottomup_components_list = [ ('Residential','#F7A11A','residential.dsg'), ('Commercial','#5D9732','commercial.dsg'), ('Industrial','#D9531E','industrial.dsg')] # Let datadir be a pathlib.Path pointing to a folder containing .dsg files ... components = [] for name, color, filename in bottomup_components_list: components.append(LoadModelComponent(name, component_type=ComponentType.BOTTOMUP, color=color)) components[-1].load_datafile(datadir / filename) model = LoadModel.create(components) Dimension mappings can be applied to individual :class:`Datafiles `, individual :class:`LoadModelComponents `, or to an entire :class:`LoadModel`. For example, this code would aggregate the model defined above to the census division level: .. code:: python from dsgrid.dataformat.enumeration import census_divisions from dsgrid.dataformat.dimmap import mappings model.map_dimension(datadir / ".." / "aggregated_to_census_division", census_divisions, mappings) See `notebooks/Visualize dsgrid model.ipynb` for more examples. Classes, methods and objects useful for working with the dsgrid EFS dataset: - :class:`dsgrid.model.LoadModel` - :class:`dsgrid.model.LoadModelComponent` - :class:`dsgrid.dataformat.dimmap.Mappings` (Also scroll to the bottom of the source code file to see the mappings module attribute and how it is defined.) - :class:`dsgrid.dataformat.dimmap.FullAggregationMap` - :class:`dsgrid.dataformat.dimmap.FilterToSubsetMap` - :class:`dsgrid.dataformat.dimmap.FilterToSingleFuelMap` - :class:`dsgrid.dataformat.dimmap.ExplicitAggregation` - :class:`dsgrid.dataformat.dimmap.UnitConversionMap` - :meth:`dsgrid.dataformat.datafile.Datafile.map_dimension` - :meth:`dsgrid.dataformat.datafile.Datafile.scale_data`