Registry¶
Registry Managers¶
- class dsgrid.registry.registry_manager.RegistryManager(params: RegistryManagerParams, db: RegistryDatabase)[source]¶
Manages registration of all projects and datasets.
- property dataset_manager: DatasetRegistryManager¶
Return the dataset manager.
- property dimension_mapping_manager: DimensionMappingRegistryManager¶
Return the dimension mapping manager.
- property dimension_manager: DimensionRegistryManager¶
Return the dimension manager.
- property project_manager: ProjectRegistryManager¶
Return the project manager.
- classmethod load(conn: DatabaseConnection, remote_path='s3://nrel-dsgrid-registry', use_remote_data=None, offline_mode=True, user=None, no_prompts=False, scratch_dir=None)[source]¶
Loads a registry from the given path.
- Parameters:
conn (DatabaseConnection)
remote_path (str, optional) – path of the remote registry; default is REMOTE_REGISTRY
use_remote_data (bool, None) – If set, use load data tables from remote_path. If not set, auto-determine what to do based on HPC or AWS EMR environment variables.
offline_mode (bool) – Load registry in offline mode; default is False
user (str) – username
no_prompts (bool) – If no_prompts is False, the user will be prompted to continue sync pulling the registry if lock files exist.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.
- Return type:
Examples
>>> from dsgrid.registry.registry_manager import RegistryManager >>> from dsgrid.registry.registry_database import DatabaseConnection >>> manager = RegistryManager.load( DatabaseConnection( hostname="dsgrid-registry.hpc.nrel.gov", database="standard-scenarios", ) )
- class dsgrid.registry.project_registry_manager.ProjectRegistryManager(path: Path, params, dataset_manager: DatasetRegistryManager, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: ProjectRegistryInterface)[source]¶
Manages registered dimension projects.
- get_by_id(project_id, version=None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- load_project(project_id: str, version=None) Project [source]¶
Load a project from the registry.
- Parameters:
project_id (str)
version (str)
- Return type:
- show(filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dataset_registry_manager.DatasetRegistryManager(path, fs_interface, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: DatasetRegistryInterface)[source]¶
Manages registered dimension datasets.
- get_by_id(dataset_id: str, version=None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_registry_manager.DimensionRegistryManager(path, params)[source]¶
Manages registered dimensions.
- get_by_id(config_id, version=None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, dimension_ids: set[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_mapping_registry_manager.DimensionMappingRegistryManager(path, params)[source]¶
Manages registered dimension mappings.
- get_by_id(mapping_id, version=None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(filters: list[str] = None, max_width: int | dict | None = None, drop_fields: list[str] = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- pydantic model dsgrid.registry.registry_database.DatabaseConnection[source]¶
Input information to connect to a registry database
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field database: str = 'dsgrid'¶
- field hostname: str = 'localhost'¶
- field password: str = 'openSesame'¶
- field port: int = 8529¶
- field username: str = 'root'¶
- property url: str¶
Return the URL of the connection.
Project¶
- class dsgrid.project.Project(config, version, dataset_configs, dimension_mgr, dimension_mapping_mgr)[source]¶
Interface to a dsgrid project.
- property version¶
Return the version of the project.
- Return type:
str
- class dsgrid.config.project_config.ProjectConfig(model)[source]¶
Provides an interface to a ProjectConfigModel.
- get_base_dimension(dimension_type: DimensionType) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles [source]¶
Return the base dimension matching dimension_type.
- get_dimension(dimension_query_name: str) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles [source]¶
Return an instance of DimensionBaseConfig.
- get_dimension_records(dimension_query_name: str) DataFrame [source]¶
Return a DataFrame containing the records for a dimension.
- list_supplemental_dimensions(dimension_type: DimensionType, sort_by=None) list[DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles] [source]¶
Return the supplemental dimensions matching dimension (if any).
- Parameters:
dimension_type (DimensionType)
sort_by (str | None) – If set, sort the dimensions by this dimension attribute.
- get_base_to_supplemental_dimension_mappings_by_types(dimension_type: DimensionType) list[MappingTableConfig] [source]¶
Return the base-to-supplemental dimension mappings for the dimension (if any).
- get_base_to_supplemental_config(dimension_query_name: str) tuple[ConfigKey, MappingTableConfig] [source]¶
Return the project’s base-to-supplemental dimension mapping config.
- get_base_to_supplemental_mapping_records(dimension_query_name: str) DataFrame [source]¶
Return the project’s base-to-supplemental dimension mapping records.
- list_dimension_query_names(category: DimensionCategory | None = None) list[str] [source]¶
Return query names for all dimensions in the project.
- Parameters:
category (DimensionCategory | None) – Optionally, filter return by category.
- get_base_dimension_to_query_name_mapping() dict[DimensionType, str] [source]¶
Return a mapping of DimensionType to query name for base dimensions.
- get_supplemental_dimension_to_query_name_mapping() dict[DimensionType, list[str]] [source]¶
Return a mapping of DimensionType to query name for supplemental dimensions.
- list_registered_dataset_ids() list[str] [source]¶
List registered datasets associated with the project.
- list_unregistered_dataset_ids() list[str] [source]¶
List unregistered datasets associated with project registry.
- get_required_dimension_record_ids(dataset_id: str, dimension_type: DimensionType) set[str] [source]¶
Return the required base dimension record IDs for the dataset and dimension type.
- pydantic model dsgrid.config.project_config.ProjectDimensionQueryNamesModel[source]¶
Defines the query names for all base and supplemental dimensions in the project.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
geography (dsgrid.config.project_config.DimensionsByCategoryModel)
metric (dsgrid.config.project_config.DimensionsByCategoryModel)
model_year (dsgrid.config.project_config.DimensionsByCategoryModel)
scenario (dsgrid.config.project_config.DimensionsByCategoryModel)
sector (dsgrid.config.project_config.DimensionsByCategoryModel)
subsector (dsgrid.config.project_config.DimensionsByCategoryModel)
time (dsgrid.config.project_config.DimensionsByCategoryModel)
weather_year (dsgrid.config.project_config.DimensionsByCategoryModel)
- field geography: DimensionsByCategoryModel [Required]¶
- field metric: DimensionsByCategoryModel [Required]¶
- field model_year: DimensionsByCategoryModel [Required]¶
- field scenario: DimensionsByCategoryModel [Required]¶
- field sector: DimensionsByCategoryModel [Required]¶
- field subsector: DimensionsByCategoryModel [Required]¶
- field time: DimensionsByCategoryModel [Required]¶
- field weather_year: DimensionsByCategoryModel [Required]¶
- pydantic model dsgrid.config.project_config.DimensionsByCategoryModel[source]¶
Defines the query names by base and supplemental category.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field base: str [Required]¶
- field subset: list[str] [Required]¶
- field supplemental: list[str] [Required]¶
Dimension¶
- class dsgrid.config.dimension_config.DimensionConfig(*args, **kwargs)[source]¶
Provides an interface to a DimensionModel.
- get_unique_ids()[source]¶
Return the unique IDs in a dimension’s records.
- Returns:
set of str
- Return type:
set
- get_records_dataframe() DataFrame ¶
Return the records in a spark dataframe. Cached on first call.
Examples¶
from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection
manager = RegistryManager.load(
DatabaseConnection(
hostname="dsgrid-registry.hpc.nrel.gov",
database="standard-scenarios",
),
offline_mode=True,
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
geo_dim = project.config.get_base_dimension(DimensionType.GEOGRAPHY)
geo_dim.get_records_dataframe().show()
print(geo_dim.get_unique_ids())
# Show the records for a supplemental dimension.
project.config.get_dimension_records("commercial_end_uses").show()