Python API¶
Most users interact with dsgrid through the CLI, but you can also drive the full lifecycle — registry creation, dataset registration, project registration, dataset submittal, and project querying — from Python. This page documents the key classes for each workflow.
Connecting to a Registry¶
- pydantic model dsgrid.registry.common.DatabaseConnection[source]¶
Input information to connect to a registry database
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
url (str)
- classmethod from_file(path: Path | str) DatabaseConnection[source]¶
Create a connection from a SQLite file path.
Uses forward slashes in the URL for cross-platform compatibility.
- class dsgrid.registry.registry_manager.RegistryManager(params: RegistryManagerParams, db: RegistryDatabase)[source]¶
Manages registration of all projects and datasets.
- classmethod create(conn: DatabaseConnection, data_path: Path, data_store_type: DataStoreType = DataStoreType.FILESYSTEM, remote_path='s3://nrel-dsgrid-registry', user=None, scratch_dir=None, overwrite=False)[source]¶
Creates a new RegistryManager at the given path.
- Parameters:
db_url (str)
data_path (Path)
data_store_type (DataStoreType)
remote_path (str) – Path to remote registry.
use_remote_data_path (None, str) – Path to remote registry.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.
overwrite (bool) – Overwrite the database if it exists.
- Return type:
- __enter__() RegistryManager[source]¶
Enter context manager.
- property dataset_manager: DatasetRegistryManager¶
Return the dataset manager.
- property dimension_mapping_manager: DimensionMappingRegistryManager¶
Return the dimension mapping manager.
- property dimension_manager: DimensionRegistryManager¶
Return the dimension manager.
- property project_manager: ProjectRegistryManager¶
Return the project manager.
- classmethod load(conn: DatabaseConnection, remote_path='s3://nrel-dsgrid-registry', use_remote_data=None, offline_mode=True, user=None, no_prompts=False, scratch_dir=None)[source]¶
Loads a registry from the given path.
- Parameters:
conn (DatabaseConnection)
remote_path (str, optional) – path of the remote registry; default is REMOTE_REGISTRY
use_remote_data (bool, None) – If set, use load data tables from remote_path. If not set, auto-determine what to do based on HPC or AWS EMR environment variables.
offline_mode (bool) – Load registry in offline mode; default is True
user (str) – username
no_prompts (bool) – If no_prompts is False, the user will be prompted to continue sync pulling the registry if lock files exist.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.
- Return type:
Examples
>>> from dsgrid.registry.registry_manager import RegistryManager >>> from dsgrid.registry.registry_database import DatabaseConnection >>> manager = RegistryManager.load( DatabaseConnection( hostname="dsgrid-registry.hpc.nrel.gov", database="standard-scenarios", ) )
Example — Load an existing registry¶
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.common import DatabaseConnection
conn = DatabaseConnection.from_file("path/to/registry.db")
manager = RegistryManager.load(conn, offline_mode=True)
Example — Create a new registry¶
from pathlib import Path
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.common import DatabaseConnection
conn = DatabaseConnection.from_file("my_registry.db")
manager = RegistryManager.create(
conn,
data_path=Path("registry_data"),
overwrite=True,
)
Browsing the Registry¶
The four manager properties on RegistryManager provide access to everything
stored in the registry.
- class dsgrid.registry.project_registry_manager.ProjectRegistryManager(path: Path, params, dataset_manager: DatasetRegistryManager, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: ProjectRegistryInterface)[source]
Manages registered dimension projects.
- get_by_id(project_id: str, version: str | None = None, conn: Connection | None = None) ProjectConfig[source]
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- load_project(project_id: str, version: str | None = None, conn: Connection | None = None) Project[source]
Load a project from the registry.
- Parameters:
project_id (str)
version (str)
- Return type:
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dataset_registry_manager.DatasetRegistryManager(path, fs_interface, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: DatasetRegistryInterface, store: DataStoreInterface)[source]
Manages registered dimension datasets.
- get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None) DatasetConfig[source]
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs: Any)[source]
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_registry_manager.DimensionRegistryManager(path, params)[source]
Manages registered dimensions.
- get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, dimension_ids: set[str] | None = None, return_table: bool = False, **kwargs)[source]
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
- class dsgrid.registry.dimension_mapping_registry_manager.DimensionMappingRegistryManager(path, params)[source]
Manages registered dimension mappings.
- get_by_id(mapping_id, version=None, conn: Connection | None = None) MappingTableConfig[source]
Get the item matching matching ID. Returns from cache if already loaded.
- Parameters:
config_id (str)
version (str) – If None, return the latest version.
- Return type:
DSGBaseModel
- Raises:
DSGValueNotRegistered – Raised if the ID is not stored.
- show(conn: Connection | None = None, filters: list[str] | None = None, max_width: int | dict | None = None, drop_fields: list[str] | None = None, return_table: bool = False, **kwargs)[source]
Show registry in PrettyTable
- Parameters:
filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show
Example — Inspect project dimensions¶
from dsgrid.dimension.base_models import DimensionType
project = manager.project_manager.load_project("dsgrid_conus_2022")
geo_dim = project.config.get_base_dimension(DimensionType.GEOGRAPHY)
geo_dim.get_records_dataframe().show()
print(geo_dim.get_unique_ids())
# Show the records for a supplemental dimension.
project.config.get_dimension_records("commercial_end_uses").show()
Projects and Datasets¶
- class dsgrid.project.Project(config: ProjectConfig, version: str, dataset_configs, dimension_mgr: DimensionRegistryManager, dimension_mapping_mgr: DimensionMappingRegistryManager, dataset_mgr: DatasetRegistryManager)[source]¶
Interface to a dsgrid project.
- property config: ProjectConfig¶
Returns the ProjectConfig.
- property version¶
Return the version of the project.
- Return type:
str
- is_registered(dataset_id)[source]¶
Provides the status of dataset_id within this project.
- Parameters:
dataset_id (str)
- Returns:
bool – True if dataset_id is in this project’s config and the dataset has been registered with (successfully submitted to) this project; False if dataset_id is in this project’s config but the dataset is not yet available.
Throws
——
DSGValueNotRegistered – If dataset_id is not in this project’s config.
- class dsgrid.config.project_config.ProjectConfig(model: ProjectConfigModel)[source]¶
Provides an interface to a ProjectConfigModel.
- get_base_dimension(dimension_type: DimensionType, dimension_name: str | None = None) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]¶
Return the base dimension matching dimension_type. If there is more than one base dimension of the given type, dimension_name is required.
See also
- get_dimension(name: str) DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]¶
Return the dimension with name.
- get_dimension_records(name: str) DataFrame[source]¶
Return a DataFrame containing the records for a dimension.
- get_dimension_record_ids(name: str) set[str][source]¶
Return the record IDs for the dimension identified by name.
- list_base_dimensions(dimension_type: DimensionType | None = None) list[DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles][source]¶
Return all base dimensions, optionally filtering to the dimension_type.
See also
- list_supplemental_dimensions(dimension_type: DimensionType, sort_by=None) list[DimensionBaseConfigWithFiles][source]¶
Return the supplemental dimensions matching dimension (if any).
- Parameters:
dimension_type (DimensionType)
sort_by (str | None) – If set, sort the dimensions by this dimension attribute.
- get_base_to_supplemental_dimension_mappings_by_types(dimension_type: DimensionType) list[MappingTableConfig][source]¶
Return the base-to-supplemental dimension mappings for the dimension (if any).
- get_base_to_supplemental_mapping_records(base_dim: DimensionBaseConfigWithFiles, supp_dim: DimensionBaseConfigWithFiles) DataFrame[source]¶
Return the project’s base-to-supplemental dimension mapping records. Excludes rows with NULL to_id values.
- list_dimension_names(category: DimensionCategory | None = None) list[str][source]¶
Return query names for all dimensions in the project.
- Parameters:
category (DimensionCategory | None) – Optionally, filter return by category.
- class dsgrid.config.dataset_config.DatasetConfig(model)[source]¶
Provides an interface to a DatasetConfigModel.
- classmethod load_from_user_path(config_file: Path, data_base_dir: Path | None = None, missing_associations_base_dir: Path | None = None) DatasetConfig[source]¶
Load a dataset config from a user-provided config file.
The config file must contain a UserDataLayout with file paths. This method validates that all required files exist.
- Parameters:
config_file (Path) – Path to the dataset configuration file.
data_base_dir (Path | None, optional) – Base directory for data files. If set and data file paths are relative, prepend them with this path instead of using the config file’s parent directory.
missing_associations_base_dir (Path | None, optional) – Base directory for missing associations files. If set and paths are relative, prepend them with this path instead of using the config file’s parent directory.
- Return type:
- Raises:
DSGInvalidParameter – If the config doesn’t have a UserDataLayout or required files don’t exist.
- class dsgrid.config.dimension_config.DimensionConfig(*args, **kwargs)[source]¶
Provides an interface to a DimensionModel.
- get_records_dataframe() DataFrame¶
Return the records in a spark dataframe. Cached on first call.
- get_unique_ids() set[str]¶
Return the unique IDs in a dimension’s records.
- Returns:
set of str
- Return type:
set
- class dsgrid.dimension.base_models.DimensionType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Dimension types
- METRIC = 'metric'¶
- GEOGRAPHY = 'geography'¶
- SECTOR = 'sector'¶
- SUBSECTOR = 'subsector'¶
- TIME = 'time'¶
- WEATHER_YEAR = 'weather_year'¶
- MODEL_YEAR = 'model_year'¶
- SCENARIO = 'scenario'¶
- classmethod from_column(column: str) DimensionType[source]¶
- static get_dimension_types_allowed_as_columns() set[DimensionType][source]¶
Return the dimension types that may exist in the data table as columns.
Config Generation¶
These helper functions create dataset and project configuration dictionaries programmatically, bypassing Pydantic validation. They are useful when building configs in scripts or tests where file-path resolution is not needed.
- dsgrid.config.dataset_config.make_unvalidated_dataset_config(dataset_id, metric_type: str, pivoted_dimension_type: DimensionType | None = None, data_classification='low', dataset_type=InputDatasetType.UNSPECIFIED, included_dimensions: list[DimensionType] | None = None, time_type: TimeDimensionType | None = None, use_project_geography_time_zone: bool = False, dimension_references: list[DimensionReferenceModel] | None = None, trivial_dimensions: list[DimensionType] | None = None, slim: bool = True, metadata: dict[str, Any] | None = None) dict[str, Any][source]¶
Create a dataset config as a dictionary, skipping validation.
- dsgrid.config.project_config.make_unvalidated_project_config(project_id: str, dataset_ids: Iterable[str], metric_types: Iterable[str], name: str | None = None, description: str | None = None, time_type: TimeDimensionType = TimeDimensionType.DATETIME) dict[str, Any][source]¶
Create a project config as a dictionary, skipping validation.
Example — Build a dataset config programmatically¶
import json5
from pathlib import Path
from dsgrid.config.dataset_config import make_unvalidated_dataset_config
config_dict = make_unvalidated_dataset_config(
dataset_id="my_dataset",
metric_type="energy",
)
# Write to a JSON5 file for registration.
Path("my_dataset/dataset.json5").write_text(json5.dumps(config_dict, indent=2))
Example — Build a project config programmatically¶
import json5
from pathlib import Path
from dsgrid.config.project_config import make_unvalidated_project_config
config_dict = make_unvalidated_project_config(
project_id="my_project",
dataset_ids=["dataset_a", "dataset_b"],
metric_types=["energy"],
name="My Project",
description="An example project.",
)
Path("my_project/project.json5").write_text(json5.dumps(config_dict, indent=2))
Registration¶
Registering dimensions¶
Dimensions are usually registered automatically as part of dataset or project
registration (via inline dimensions in the config). If you need to register
dimensions independently — for example, to share a dimension across multiple
datasets before registering any of them — use this method.
Registering dimension mappings¶
Dimension mappings are usually registered as part of submit_dataset. If you
need to register mappings independently — for example, to reuse a mapping
across multiple dataset submissions, or to use with a standalone dataset
query — use this method.
Registering a dataset¶
- DatasetRegistryManager.register(config_file: Path, submitter: str | None = None, log_message: str | None = None, context: RegistrationContext | None = None, data_base_dir: Path | None = None, missing_associations_base_dir: Path | None = None, requirements: DatasetDimensionRequirements | None = None)[source]¶
Registers a config file in the registry.
- Raises:
ValueError – Raised if the config_file is invalid.
DSGDuplicateValueRegistered – Raised if the config ID is already registered.
Registering a project¶
Submitting a dataset to a project¶
- ProjectRegistryManager.submit_dataset(**kwargs)¶
- ProjectRegistryManager.register_and_submit_dataset(**kwargs)¶
Example — Register and submit a dataset¶
from pathlib import Path
# Register the dataset.
manager.dataset_manager.register(
config_file=Path("my_dataset/dataset.json5"),
submitter="Jane Doe",
log_message="Initial registration of my_dataset",
)
# Submit it to a project with dimension mappings.
manager.project_manager.submit_dataset(
project_id="dsgrid_conus_2022",
dataset_id="my_dataset",
submitter="Jane Doe",
log_message="Submit my_dataset to dsgrid_conus_2022",
dimension_mapping_file=Path("my_dataset/dimension_mappings.json5"),
)
Queries¶
Project query data models¶
- pydantic model dsgrid.query.models.ProjectQueryModel[source]¶
Represents a user query on a Project.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
project (dsgrid.query.models.ProjectQueryParamsModel)
- pydantic model dsgrid.query.models.ProjectQueryParamsModel[source]¶
Defines how to transform a project into a CompositeDataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
project_id (str)dataset (dsgrid.query.models.DatasetModel)excluded_dataset_ids (list[str])include_dsgrid_dataset_components (bool)version (str | None)mapping_plans (list[dsgrid.query.dataset_mapping_plan.DatasetMappingPlan])spark_conf_per_dataset (list[dsgrid.query.models.SparkConfByDataset])
- Validators:
check_unsupported_fields»all fieldscheck_invalid_dataset_ids»all fieldscheck_duplicate_dataset_ids»mapping_planscheck_duplicate_dataset_ids»spark_conf_per_dataset
- pydantic model dsgrid.query.models.QueryResultParamsModel[source]¶
Controls post-processing and storage of CompositeDatasets
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
replace_ids_with_names (bool)aggregations (list[dsgrid.query.models.AggregationModel])aggregate_each_dataset (bool)reports (list[dsgrid.query.models.ReportInputModel])column_type (dsgrid.query.models.ColumnType)table_format (dsgrid.dataset.models.PivotedTableFormatModel | dsgrid.dataset.models.StackedTableFormatModel)output_format (str)sort_columns (list[str])dimension_filters (list[dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel | dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel | dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel | dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel | dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel | dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel])time_zone (str | Literal['geography'] | None)
- Validators:
check_pivot_dimension_type»all fieldscheck_format»output_formatcheck_column_type»all fields
- pydantic model dsgrid.query.models.DatasetModel[source]¶
Specifies the datasets to use in a project query.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
dataset_id (str)source_datasets (list[dsgrid.query.models.StandaloneDatasetModel | dsgrid.query.models.ProjectionDatasetModel])expression (str | None)params (dsgrid.query.models.ProjectQueryDatasetParamsModel)
- Validators:
handle_expression»expression
- pydantic model dsgrid.query.models.StandaloneDatasetModel[source]¶
A dataset with energy use data.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
dataset_type (Literal[dsgrid.query.models.DatasetType.STANDALONE])dataset_id (str)
- pydantic model dsgrid.query.models.ProjectionDatasetModel[source]¶
A dataset with growth rates that can be applied to a standalone dataset.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
dataset_type (Literal[dsgrid.query.models.DatasetType.PROJECTION])dataset_id (str)initial_value_dataset_id (str)growth_rate_dataset_id (str)construction_method (dsgrid.query.models.DatasetConstructionMethod)base_year (int | None)
- pydantic model dsgrid.query.models.AggregationModel[source]¶
Aggregate on one or more dimensions.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
aggregation_function (Any)dimensions (dsgrid.query.models.DimensionNamesModel)
- Validators:
check_aggregation_function»aggregation_functioncheck_for_metric»dimensions
- iter_dimensions_to_keep() Generator[tuple[DimensionType, ColumnModel], None, None][source]¶
Yield the dimension type and ColumnModel for each dimension to keep.
- list_dropped_dimensions() list[DimensionType][source]¶
Return a list of dimension types that will be dropped by the aggregation.
- pydantic model dsgrid.query.models.DimensionNamesModel[source]¶
Defines the list of dimensions to which the value columns should be aggregated. If a value is empty, that dimension will be aggregated and dropped from the table.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
geography (list[str | dsgrid.query.models.ColumnModel])metric (list[str | dsgrid.query.models.ColumnModel])model_year (list[str | dsgrid.query.models.ColumnModel])scenario (list[str | dsgrid.query.models.ColumnModel])sector (list[str | dsgrid.query.models.ColumnModel])subsector (list[str | dsgrid.query.models.ColumnModel])time (list[str | dsgrid.query.models.ColumnModel])weather_year (list[str | dsgrid.query.models.ColumnModel])
- Validators:
fix_columns»all fields
- pydantic model dsgrid.query.models.ColumnModel[source]¶
Defines one column in a SQL aggregation statement.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
dimension_name (str)function (Any)alias (str | None)
- Validators:
handle_function»functionhandle_alias»alias
Dataset query data models¶
A dataset query remaps a registered dataset’s dimensions without involving a project. This is useful when you want to map a dataset to alternate dimensions (e.g., county → state) as a standalone operation. The required dimension mappings must already be registered in the registry.
- pydantic model dsgrid.query.models.DatasetQueryModel[source]¶
Defines how to transform a dataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
dataset_id (str)to_dimension_references (list[dsgrid.config.dimensions.DimensionReferenceModel])mapping_plan (dsgrid.query.dataset_mapping_plan.DatasetMappingPlan | None)time_based_data_adjustment (dsgrid.dimension.time.TimeBasedDataAdjustmentModel)wrap_time_allowed (bool)result (dsgrid.query.models.QueryResultParamsModel)
- dsgrid.query.models.make_dataset_query(name: str, dataset_id: str, to_dimension_references: list[DimensionReferenceModel], plan: DatasetMappingPlan | None = None) DatasetQueryModel[source]¶
Create a query to map a dataset to alternate dimensions.
- Parameters:
dataset_id (str)
plan (DatasetMappingPlan | None) – Optional plan to control the mapping operation.
Query submission¶
- class dsgrid.query.query_submitter.ProjectQuerySubmitter(project: Project, *args, **kwargs)[source]¶
Submits queries for a project.
- submit(**kwargs)¶
Submit a query for execution
Example — Submit a project query¶
from dsgrid.query.models import (
AggregationModel,
DatasetModel,
DimensionNamesModel,
ProjectQueryParamsModel,
ProjectQueryModel,
QueryResultParamsModel,
StandaloneDatasetModel,
)
from dsgrid.query.query_submitter import ProjectQuerySubmitter
project = manager.project_manager.load_project("dsgrid_conus_2022")
submitter = ProjectQuerySubmitter(project, output_dir=Path("query_output"))
query = ProjectQueryModel(
name="Total Electricity Use By State and Sector",
project=ProjectQueryParamsModel(
project_id="dsgrid_conus_2022",
dataset=DatasetModel(
dataset_id="electricity_use",
source_datasets=[
StandaloneDatasetModel(dataset_id="comstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="resstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="tempo_conus_2022_mapped"),
],
),
),
result=QueryResultParamsModel(
aggregations=[
AggregationModel(
dimensions=DimensionNamesModel(
geography=["state"],
metric=["electricity_collapsed"],
model_year=[],
scenario=[],
sector=["sector"],
subsector=[],
time=[],
weather_year=[],
),
aggregation_function="sum",
),
],
),
)
df = submitter.submit(query)
df.show()
Example — Map a dataset to alternate dimensions¶
from pathlib import Path
from dsgrid.config.dimensions import DimensionReferenceModel
from dsgrid.query.models import make_dataset_query
from dsgrid.query.query_submitter import DatasetQuerySubmitter
# Identify the target dimension (must already be registered).
to_dim_ref = DimensionReferenceModel(
dimension_type="geography",
dimension_id="<state-dimension-uuid>",
version="1.0.0",
)
query = make_dataset_query(
name="my_dataset_remapped_to_state",
dataset_id="my_dataset",
to_dimension_references=[to_dim_ref],
)
submitter = DatasetQuerySubmitter(output_dir=Path("dataset_query_output"))
df = submitter.submit(query, mgr=manager)
df.show()