Registry

Registry Managers

class dsgrid.registry.registry_manager.RegistryManager(params: RegistryManagerParams, db: RegistryDatabase)[source]

Manages registration of all projects and datasets.

property dataset_manager: DatasetRegistryManager

Return the dataset manager.

property dimension_mapping_manager: DimensionMappingRegistryManager

Return the dimension mapping manager.

property dimension_manager: DimensionRegistryManager

Return the dimension manager.

property project_manager: ProjectRegistryManager

Return the project manager.

classmethod load(conn: DatabaseConnection, remote_path='s3://nrel-dsgrid-registry', use_remote_data=None, offline_mode=True, user=None, no_prompts=False, scratch_dir=None)[source]

Loads a registry from the given path.

Parameters:
  • conn (DatabaseConnection)

  • remote_path (str, optional) – path of the remote registry; default is REMOTE_REGISTRY

  • use_remote_data (bool, None) – If set, use load data tables from remote_path. If not set, auto-determine what to do based on HPC or AWS EMR environment variables.

  • offline_mode (bool) – Load registry in offline mode; default is False

  • user (str) – username

  • no_prompts (bool) – If no_prompts is False, the user will be prompted to continue sync pulling the registry if lock files exist.

  • scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.

Return type:

RegistryManager

Examples

>>> from dsgrid.registry.registry_manager import RegistryManager
>>> from dsgrid.registry.registry_database import DatabaseConnection
>>> manager = RegistryManager.load(
    DatabaseConnection(
        hostname="dsgrid-registry.hpc.nrel.gov",
        database="standard-scenarios",
    )
)
class dsgrid.registry.project_registry_manager.ProjectRegistryManager(path: Path, params, dataset_manager: DatasetRegistryManager, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: ProjectRegistryInterface)[source]

Manages registered dimension projects.

get_by_id(project_id: str, version: str | None = None, conn: Connection | None = None)[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:
  • config_id (str)

  • version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

load_project(project_id: str, version: str | None = None, conn: Connection | None = None) Project[source]

Load a project from the registry.

Parameters:
  • project_id (str)

  • version (str)

Return type:

Project

show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]

Show registry in PrettyTable

Parameters:
  • filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])

  • max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name

  • drop_fields – List of field names not to show

class dsgrid.registry.dataset_registry_manager.DatasetRegistryManager(path, fs_interface, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: DatasetRegistryInterface)[source]

Manages registered dimension datasets.

get_by_id(dataset_id: str, version=None, conn: Connection | None = None)[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:
  • config_id (str)

  • version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]

Show registry in PrettyTable

Parameters:
  • filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])

  • max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name

  • drop_fields – List of field names not to show

class dsgrid.registry.dimension_registry_manager.DimensionRegistryManager(path, params)[source]

Manages registered dimensions.

get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None)[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:
  • config_id (str)

  • version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, dimension_ids: set[str] | None = None, return_table: bool = False, **kwargs)[source]

Show registry in PrettyTable

Parameters:
  • filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])

  • max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name

  • drop_fields – List of field names not to show

class dsgrid.registry.dimension_mapping_registry_manager.DimensionMappingRegistryManager(path, params)[source]

Manages registered dimension mappings.

get_by_id(mapping_id, version=None, conn: Connection | None = None)[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:
  • config_id (str)

  • version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]

Show registry in PrettyTable

Parameters:
  • filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])

  • max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name

  • drop_fields – List of field names not to show

pydantic model dsgrid.registry.registry_database.DatabaseConnection[source]

Input information to connect to a registry database

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

digraph "Entity Relationship Diagram created by erdantic" {
   graph [fontcolor=gray66,
      fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=9,
      nodesep=0.5,
      rankdir=LR,
      ranksep=1.5
   ];
   node [fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=14,
      label="\N",
      shape=plain
   ];
   edge [dir=both];
   "dsgrid.registry.common.DatabaseConnection"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DatabaseConnection</b></td></tr><tr><td>url</td><td port="url">str</td></tr></table>>,
      tooltip="dsgrid.registry.common.DatabaseConnection&#xA;&#xA;Input information to connect to a registry database&#xA;"];
}
Fields:
  • url (str)

field url: str [Required]
get_filename() Path | None[source]

Return the filename from the URL, if file-based, otherwise None.

Project

class dsgrid.project.Project(config, version, dataset_configs, dimension_mgr, dimension_mapping_mgr)[source]

Interface to a dsgrid project.

property version

Return the version of the project.

Return type:

str

get_dataset(dataset_id)[source]

Returns a Dataset. Calls load_dataset if it hasn’t already been loaded.

Parameters:

dataset_id (str)

Return type:

Dataset

class dsgrid.config.project_config.ProjectConfig(model)[source]

Provides an interface to a ProjectConfigModel.

get_base_dimension(dimension_type: DimensionType) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]

Return the base dimension matching dimension_type.

get_dimension(dimension_query_name: str) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]

Return an instance of DimensionBaseConfig.

get_dimension_records(dimension_query_name: str) DataFrame[source]

Return a DataFrame containing the records for a dimension.

list_supplemental_dimensions(dimension_type: DimensionType, sort_by=None) list[DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles][source]

Return the supplemental dimensions matching dimension (if any).

Parameters:
  • dimension_type (DimensionType)

  • sort_by (str | None) – If set, sort the dimensions by this dimension attribute.

get_base_to_supplemental_dimension_mappings_by_types(dimension_type: DimensionType) list[MappingTableConfig][source]

Return the base-to-supplemental dimension mappings for the dimension (if any).

get_base_to_supplemental_config(dimension_query_name: str) tuple[ConfigKey, MappingTableConfig][source]

Return the project’s base-to-supplemental dimension mapping config.

get_base_to_supplemental_mapping_records(dimension_query_name: str) DataFrame[source]

Return the project’s base-to-supplemental dimension mapping records.

list_dimension_query_names(category: DimensionCategory | None = None) list[str][source]

Return query names for all dimensions in the project.

Parameters:

category (DimensionCategory | None) – Optionally, filter return by category.

get_base_dimension_to_query_name_mapping() dict[DimensionType, str][source]

Return a mapping of DimensionType to query name for base dimensions.

get_supplemental_dimension_to_query_name_mapping() dict[DimensionType, list[str]][source]

Return a mapping of DimensionType to query name for supplemental dimensions.

list_registered_dataset_ids() list[str][source]

List registered datasets associated with the project.

list_unregistered_dataset_ids() list[str][source]

List unregistered datasets associated with project registry.

get_required_dimension_record_ids(dataset_id: str, dimension_type: DimensionType) set[str][source]

Return the required base dimension record IDs for the dataset and dimension type.

pydantic model dsgrid.config.project_config.ProjectDimensionQueryNamesModel[source]

Defines the query names for all base and supplemental dimensions in the project.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

digraph "Entity Relationship Diagram created by erdantic" {
   graph [fontcolor=gray66,
      fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=9,
      nodesep=0.5,
      rankdir=LR,
      ranksep=1.5
   ];
   node [fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=14,
      label="\N",
      shape=plain
   ];
   edge [dir=both];
   "dsgrid.config.project_config.DimensionsByCategoryModel"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionsByCategoryModel</b></td></tr><tr><td>base</td><td port="base">str</td></tr><tr><td>subset</td><td port="subset">list[str]</td></tr><tr><td>supplemental</td><td port="supplemental">list[str]</td></tr></table>>,
      tooltip="dsgrid.config.project_config.DimensionsByCategoryModel&#xA;&#xA;Defines the query names by base and supplemental category.&#xA;"];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>ProjectDimensionQueryNamesModel</b></td></tr><tr><td>geography</td><td port="geography">DimensionsByCategoryModel</td></tr><tr><td>metric</td><td port="metric">DimensionsByCategoryModel</td></tr><tr><td>model_year</td><td port="model_year">DimensionsByCategoryModel</td></tr><tr><td>scenario</td><td port="scenario">DimensionsByCategoryModel</td></tr><tr><td>sector</td><td port="sector">DimensionsByCategoryModel</td></tr><tr><td>subsector</td><td port="subsector">DimensionsByCategoryModel</td></tr><tr><td>time</td><td port="time">DimensionsByCategoryModel</td></tr><tr><td>weather_year</td><td port="weather_year">DimensionsByCategoryModel</td></tr></table>>,
      tooltip="dsgrid.config.project_config.ProjectDimensionQueryNamesModel&#xA;&#xA;Defines the query names for all base and supplemental dimensions \
in the project.&#xA;"];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":geography:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":metric:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":model_year:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":scenario:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":sector:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":subsector:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":time:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
   "dsgrid.config.project_config.ProjectDimensionQueryNamesModel":weather_year:e -> "dsgrid.config.project_config.DimensionsByCategoryModel":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
}
Fields:
field geography: DimensionsByCategoryModel [Required]
field metric: DimensionsByCategoryModel [Required]
field model_year: DimensionsByCategoryModel [Required]
field scenario: DimensionsByCategoryModel [Required]
field sector: DimensionsByCategoryModel [Required]
field subsector: DimensionsByCategoryModel [Required]
field time: DimensionsByCategoryModel [Required]
field weather_year: DimensionsByCategoryModel [Required]
pydantic model dsgrid.config.project_config.DimensionsByCategoryModel[source]

Defines the query names by base and supplemental category.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

digraph "Entity Relationship Diagram created by erdantic" {
   graph [fontcolor=gray66,
      fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=9,
      nodesep=0.5,
      rankdir=LR,
      ranksep=1.5
   ];
   node [fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=14,
      label="\N",
      shape=plain
   ];
   edge [dir=both];
   "dsgrid.config.project_config.DimensionsByCategoryModel"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionsByCategoryModel</b></td></tr><tr><td>base</td><td port="base">str</td></tr><tr><td>subset</td><td port="subset">list[str]</td></tr><tr><td>supplemental</td><td port="supplemental">list[str]</td></tr></table>>,
      tooltip="dsgrid.config.project_config.DimensionsByCategoryModel&#xA;&#xA;Defines the query names by base and supplemental category.&#xA;"];
}
Fields:
field base: str [Required]
field subset: list[str] [Required]
field supplemental: list[str] [Required]

Dimension

class dsgrid.config.dimension_config.DimensionConfig(*args, **kwargs)[source]

Provides an interface to a DimensionModel.

get_unique_ids()[source]

Return the unique IDs in a dimension’s records.

Returns:

set of str

Return type:

set

get_records_dataframe() DataFrame

Return the records in a spark dataframe. Cached on first call.

Examples

from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection

manager = RegistryManager.load(
    DatabaseConnection(
        hostname="dsgrid-registry.hpc.nrel.gov",
        database="standard-scenarios",
    ),
    offline_mode=True,
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
geo_dim = project.config.get_base_dimension(DimensionType.GEOGRAPHY)
geo_dim.get_records_dataframe().show()
print(geo_dim.get_unique_ids())
# Show the records for a supplemental dimension.
project.config.get_dimension_records("commercial_end_uses").show()