Registry¶
Registry Managers¶
- class dsgrid.registry.registry_manager.RegistryManager(params: RegistryManagerParams, db: RegistryDatabase)[source]¶
Manages registration of all projects and datasets.
- property dataset_manager: DatasetRegistryManager¶
Return the dataset manager.
- property dimension_mapping_manager: DimensionMappingRegistryManager¶
Return the dimension mapping manager.
- property dimension_manager: DimensionRegistryManager¶
Return the dimension manager.
- property project_manager: ProjectRegistryManager¶
Return the project manager.
- classmethod load(conn: DatabaseConnection, remote_path='s3://nrel-dsgrid-registry', use_remote_data=None, offline_mode=True, user=None, no_prompts=False, scratch_dir=None)[source]¶
Loads a registry from the given path.
- Parameters:
conn (DatabaseConnection)
remote_path (str, optional) – path of the remote registry; default is REMOTE_REGISTRY
use_remote_data (bool, None) – If set, use load data tables from remote_path. If not set, auto-determine what to do based on HPC or AWS EMR environment variables.
offline_mode (bool) – Load registry in offline mode; default is False
user (str) – username
no_prompts (bool) – If no_prompts is False, the user will be prompted to continue sync pulling the registry if lock files exist.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.
- Return type:
Examples
>>> from dsgrid.registry.registry_manager import RegistryManager >>> from dsgrid.registry.registry_database import DatabaseConnection >>> manager = RegistryManager.load( DatabaseConnection( hostname="dsgrid-registry.hpc.nrel.gov", database="standard-scenarios", ) )
- class dsgrid.registry.project_registry_manager.ProjectRegistryManager(path: Path, params, dataset_manager: DatasetRegistryManager, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: ProjectRegistryInterface)[source]¶
Manages registered dimension projects.
- get_by_id(project_id: str, version: str | None = None, conn: Connection | None = None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- load_project(project_id: str, version: str | None = None, conn: Connection | None = None) Project [source]¶
Load a project from the registry.
- Parameters:
project_id (str)
version (str)
- Return type:
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dataset_registry_manager.DatasetRegistryManager(path, fs_interface, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: DatasetRegistryInterface)[source]¶
Manages registered dimension datasets.
- get_by_id(dataset_id: str, version=None, conn: Connection | None = None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_registry_manager.DimensionRegistryManager(path, params)[source]¶
Manages registered dimensions.
- get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, dimension_ids: set[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_mapping_registry_manager.DimensionMappingRegistryManager(path, params)[source]¶
Manages registered dimension mappings.
- get_by_id(mapping_id, version=None, conn: Connection | None = None)[source]¶
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]¶
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- pydantic model dsgrid.registry.registry_database.DatabaseConnection[source]¶
Input information to connect to a registry database
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
url (str)
- field url: str [Required]¶
Project¶
- class dsgrid.project.Project(config: ProjectConfig, version: str, dataset_configs, dimension_mgr: DimensionRegistryManager, dimension_mapping_mgr: DimensionMappingRegistryManager)[source]¶
Interface to a dsgrid project.
- property version¶
Return the version of the project.
- Return type:
str
- class dsgrid.config.project_config.ProjectConfig(model: ProjectConfigModel)[source]¶
Provides an interface to a ProjectConfigModel.
- get_base_dimension(dimension_type: DimensionType, dimension_name: str | None = None) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles [source]¶
Return the base dimension matching dimension_type. If there is more than one base dimension of the given type, dimension_name is required.
See also
list_base_dimensions
- get_dimension(name: str) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles [source]¶
Return the dimension with name.
- get_dimension_records(name: str) DataFrame [source]¶
Return a DataFrame containing the records for a dimension.
- list_supplemental_dimensions(dimension_type: DimensionType, sort_by=None) list[DimensionBaseConfigWithFiles] [source]¶
Return the supplemental dimensions matching dimension (if any).
- Parameters:
dimension_type (DimensionType)
sort_by (str | None) – If set, sort the dimensions by this dimension attribute.
- get_base_to_supplemental_dimension_mappings_by_types(dimension_type: DimensionType) list[MappingTableConfig] [source]¶
Return the base-to-supplemental dimension mappings for the dimension (if any).
- get_base_to_supplemental_config(base_dim: DimensionBaseConfigWithFiles, supp_dim: DimensionBaseConfigWithFiles) MappingTableConfig [source]¶
Return the project’s base-to-supplemental dimension mapping config for the given base and supplemental dimensions.
- get_base_to_supplemental_mapping_records(base_dim: DimensionBaseConfigWithFiles, supp_dim: DimensionBaseConfigWithFiles) DataFrame [source]¶
Return the project’s base-to-supplemental dimension mapping records. Excludes rows with NULL to_id values.
- list_registered_dataset_ids() list[str] [source]¶
List registered datasets associated with the project.
- list_unregistered_dataset_ids() list[str] [source]¶
List unregistered datasets associated with project registry.
- get_required_dimension_record_ids(dataset_id: str, dimension_type: DimensionType) set[str] [source]¶
Return the required base dimension record IDs for the dataset and dimension type.
- pydantic model dsgrid.config.project_config.DimensionsByCategoryModel[source]¶
Defines the query names by base and supplemental category.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field base: list[str] [Required]¶
- field subset: list[str] [Required]¶
- field supplemental: list[str] [Required]¶
Dimension¶
- class dsgrid.config.dimension_config.DimensionConfig(*args, **kwargs)[source]¶
Provides an interface to a DimensionModel.
- get_records_dataframe() DataFrame ¶
Return the records in a spark dataframe. Cached on first call.
- get_unique_ids()¶
Return the unique IDs in a dimension’s records.
- Returns:
set of str
- Return type:
set
Examples¶
from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection
manager = RegistryManager.load(
DatabaseConnection(
hostname="dsgrid-registry.hpc.nrel.gov",
database="standard-scenarios",
),
offline_mode=True,
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
geo_dim = project.config.get_base_dimension(DimensionType.GEOGRAPHY)
geo_dim.get_records_dataframe().show()
print(geo_dim.get_unique_ids())
# Show the records for a supplemental dimension.
project.config.get_dimension_records("commercial_end_uses").show()