Getting Started¶

This is a basic overview of how to use the dsgrid-legacy-efs-api dsgrid package. If desired, more extensive examples can be found throughout the notebooks and tests.

For accessing the dsgrid EFS data, reading data files and working with dsgrid models is probably most of interest. However, the brief primer on how to create new data files may be useful background information. Also note that while a .dsg extension is used for the dsgrid EFS data files, the underlying format is basic HDF5 and can be browsed with a basic viewer like HDFView.

Installation¶

To get the basic package, run:

pip install dsgrid-legacy-efs-api

If you would like to run the example notebooks and browse the files available through the Open Energy Data Initiative (OEDI), install the required extra dependencies:

pip install dsgrid-legacy-efs-api[ntbks,oedi]

and also clone the repository. Then you should be able to run the .ipynb files in the dsgrid-legacy-efs-api/notebooks folder, which include functionality for directly browsing the OEDI oedi-data-lake/dsgrid-2018-efs data files. If you would like to use the HSDS service, please see the configuration instructions at https://github.com/NREL/hsds-examples/.

Creating a new data file¶

To begin, create an empty Datafile object. This involves providing a file path for the HDF5 file that will be created, and a set of valid sector, geography, enduse, and time enumerations. An Enumeration includes both a list of unique IDs identifying individual allowed values, as well as a matching list of more descriptive names. The package includes predefined Enumerations for sector model data.

from dsgrid.datafile import Datafile
from dsgrid.enumeration import (
    sectors_subsectors, counties, enduses, hourly2012
)

f = Datafile("data.dsg", sectors_subsectors, counties, enduses, hourly2012)

A SectorDataset can now be added to the Datafile. Note that here “sector” refers to both levels of the sector/subsector hierarchy. This is for extensibility of the format to support less resolved datasets where data may only be available by aggregate sector, or even just economy-wide.

The following would create a sector dataset that spans all enduses and time periods, assuming the provided sector ID exists in f’s SectorEnumeration:

f.add_sector("res__SingleFamilyDetached")

However, it’s likely that a single sector/subsector will not be drawing load for all possible end uses. In that case, to save space on disk, the sector can be defined to use only a subset of the end-uses listed in the Datafile's EndUseEnumerationBase ID list:

singlefamilydetached = f.add_sector("res__SingleFamilyDetached",
                                    enduses=["heating", "cooling", "interior_lights"])

One could restrict the dataset to a subset of times in a similar fashion.

Simulation data can now be assigned to the sector (subsector). The data should be in the form of a Pandas DataFrame with rows indices corresponding to IDs in the Datafile's TimeEnumeration and column names corresponding to enduse IDs in the Datafile's EndUseEnumeration (or the predetermined subset discussed immediately above). Each DataFrame is assigned to at least one geography, which are represented by IDs in the Datafile's GeographyEnumeration. In this case, "08059" is the ID and FIPS code for Jefferson County, Colorado:

singlefamilydetached["08059"] = jeffco_sfd_data
singlefamilydetached[["08001", "08003", "08005"]] = same_sfd_data_in_many_counties

Individual geographies can be associated with a scaling factor to be applied to their corresponding data, although this feature is not accessible through the indexed assignment syntax and instead requires a method call. This is most useful when load shapes are shared between counties but magnitudes differ:

singlefamilydetached.add_data(same_sfd_shape_different_magnitudes,
                              ["01001", "01003", "01005"], [1.1, 2.3, 6.7])

All data is persisted to disk (not stored in memory) as soon as it is assigned, so after adding data no further steps are required to save out the file.

Additional classes and methods useful for creating new data:

Reading in an existing data file¶

If a dsgrid-formatted HDF5 file already exists, it can be read into a Datafile object:

f2 = Datafile.load("data.dsg")

All of the data will then be accessible to Python just as it was when the file was first created, for example:

sfd = f2["res__SingleFamilyDetached"]
jeffco_sfd = sfd["08059"]

For easier data manipulation, the full contents of the Datafile can also be read into memory in a tabular format by creating a Datatable object:

from dsgrid.dataformat.datatable import Datatable
dt = Datatable(f2)

A Datatable is just a thin wrapper around a Pandas Series with a four-level MultiIndex. The Datatable can be indexed into for quick access to a relevant subset of the data, or the underlying Series can be accessed and manipulated directly.

# Accessing a single value
dt["res__SingleFamilyDetached", "08059", "heating", "2012-04-28 02:00:00-05:00"]

# Accessing a Series slice
dt["res__SingleFamilyDetached", "08059", "heating", :]

# Working directly with the underlying Series
sector_enduse_totals = dt.data.groupby(levels=["sector", "enduse"]).sum()

Additional methods useful for accessing data:

dsgrid.dataformat.sectordataset.SectorDataset.get_data()

Working with a dsgrid model (collection of data files)¶

A dsgrid.model.LoadModel holds a collection of related datafiles and tag each one with its ComponentType and an optional color (for plotting). For example, a LoadModel can be formed just from the ComponentType.BOTTOMUP components:

from dsgrid.model import ComponentType, LoadModelComponent, LoadModel

bottomup_components_list = [
    ('Residential','#F7A11A','residential.dsg'),
    ('Commercial','#5D9732','commercial.dsg'),
    ('Industrial','#D9531E','industrial.dsg')]

# Let datadir be a pathlib.Path pointing to a folder containing .dsg files ...
components = []
for name, color, filename in bottomup_components_list:
    components.append(LoadModelComponent(name, component_type=ComponentType.BOTTOMUP, color=color))
    components[-1].load_datafile(datadir / filename)
model = LoadModel.create(components)

Dimension mappings can be applied to individual Datafiles, individual LoadModelComponents, or to an entire LoadModel. For example, this code would aggregate the model defined above to the census division level:

from dsgrid.dataformat.enumeration import census_divisions
from dsgrid.dataformat.dimmap import mappings

model.map_dimension(datadir / ".." / "aggregated_to_census_division", census_divisions, mappings)

See notebooks/Visualize dsgrid model.ipynb for more examples.

Classes, methods and objects useful for working with the dsgrid EFS dataset:

dsgrid.model.LoadModel
dsgrid.model.LoadModelComponent
dsgrid.dataformat.dimmap.Mappings (Also scroll to the bottom of the source code file to see the mappings module attribute and how it is defined.)
dsgrid.dataformat.dimmap.FullAggregationMap
dsgrid.dataformat.dimmap.FilterToSubsetMap
dsgrid.dataformat.dimmap.FilterToSingleFuelMap
dsgrid.dataformat.dimmap.ExplicitAggregation
dsgrid.dataformat.dimmap.UnitConversionMap
dsgrid.dataformat.datafile.Datafile.map_dimension()
dsgrid.dataformat.datafile.Datafile.scale_data()