Python API¶

Most users interact with dsgrid through the CLI, but you can also drive the full lifecycle — registry creation, dataset registration, project registration, dataset submittal, and project querying — from Python. This page documents the key classes for each workflow.

Connecting to a Registry¶

pydantic model dsgrid.registry.common.DatabaseConnection[source]¶

Input information to connect to a registry database

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

url (str)

classmethod from_file(path: Path | str) → DatabaseConnection[source]¶

Create a connection from a SQLite file path.

Uses forward slashes in the URL for cross-platform compatibility.

get_filename() → Path[source]¶

Return the filename from the URL. Only valid for SQLite databases.

Raises:: DSGInvalidParameter – Raised if the URL does not conform to the SQLite format.

try_get_filename() → Path | None[source]¶: Return the filename from the URL, if file-based, otherwise None.

class dsgrid.registry.registry_manager.RegistryManager(params: RegistryManagerParams, db: RegistryDatabase)[source]¶

Manages registration of all projects and datasets.

classmethod create(conn: DatabaseConnection, data_path: Path, data_store_type: DataStoreType = DataStoreType.FILESYSTEM, remote_path='s3://nrel-dsgrid-registry', user=None, scratch_dir=None, overwrite=False)[source]¶

Creates a new RegistryManager at the given path.

Parameters:

db_url (str)
data_path (Path)
data_store_type (DataStoreType)
remote_path (str) – Path to remote registry.
use_remote_data_path (None, str) – Path to remote registry.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.
overwrite (bool) – Overwrite the database if it exists.

Return type:

RegistryManager

dispose() → None[source]¶: Dispose the database engine and release all connections.

__enter__() → RegistryManager[source]¶: Enter context manager.

__exit__(exc_type, exc_val, exc_tb) → None[source]¶: Exit context manager and dispose resources.

property dataset_manager: DatasetRegistryManager¶: Return the dataset manager.

property dimension_mapping_manager: DimensionMappingRegistryManager¶: Return the dimension mapping manager.

property dimension_manager: DimensionRegistryManager¶: Return the dimension manager.

property project_manager: ProjectRegistryManager¶: Return the project manager.

classmethod load(conn: DatabaseConnection, remote_path='s3://nrel-dsgrid-registry', use_remote_data=None, offline_mode=True, user=None, no_prompts=False, scratch_dir=None)[source]¶

Loads a registry from the given path.

Parameters:

conn (DatabaseConnection)
remote_path (str, optional) – path of the remote registry; default is REMOTE_REGISTRY
use_remote_data (bool, None) – If set, use load data tables from remote_path. If not set, auto-determine what to do based on HPC or AWS EMR environment variables.
offline_mode (bool) – Load registry in offline mode; default is True
user (str) – username
no_prompts (bool) – If no_prompts is False, the user will be prompted to continue sync pulling the registry if lock files exist.
scratch_dir (None | Path) – Base directory for dsgrid temporary directories. Must be accessible on all compute nodes. Defaults to the current directory.

Return type:

RegistryManager

Examples

>>> from dsgrid.registry.registry_manager import RegistryManager
>>> from dsgrid.registry.registry_database import DatabaseConnection
>>> manager = RegistryManager.load(
    DatabaseConnection(
        hostname="dsgrid-registry.hpc.nrel.gov",
        database="standard-scenarios",
    )
)

Example — Load an existing registry¶

from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.common import DatabaseConnection

conn = DatabaseConnection.from_file("path/to/registry.db")
manager = RegistryManager.load(conn, offline_mode=True)

Example — Create a new registry¶

from pathlib import Path
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.common import DatabaseConnection

conn = DatabaseConnection.from_file("my_registry.db")
manager = RegistryManager.create(
    conn,
    data_path=Path("registry_data"),
    overwrite=True,
)

Browsing the Registry¶

The four manager properties on RegistryManager provide access to everything stored in the registry.

class dsgrid.registry.project_registry_manager.ProjectRegistryManager(path: Path, params, dataset_manager: DatasetRegistryManager, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: ProjectRegistryInterface)[source]

Manages registered dimension projects.

get_by_id(project_id: str, version: str | None = None, conn: Connection | None = None) → ProjectConfig[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:

config_id (str)
version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

load_project(project_id: str, version: str | None = None, conn: Connection | None = None) → Project[source]

Load a project from the registry.

Parameters:

project_id (str)
version (str)

Return type:

Project

Show registry in PrettyTable

Parameters:

filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show

class dsgrid.registry.dataset_registry_manager.DatasetRegistryManager(path, fs_interface, dimension_manager: DimensionRegistryManager, dimension_mapping_manager: DimensionMappingRegistryManager, db: DatasetRegistryInterface, store: DataStoreInterface)[source]

Manages registered dimension datasets.

get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None) → DatasetConfig[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:

config_id (str)
version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

Show registry in PrettyTable

Parameters:

filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show

class dsgrid.registry.dimension_registry_manager.DimensionRegistryManager(path, params)[source]

Manages registered dimensions.

get_by_id(config_id: str, version: str | None = None, conn: Connection | None = None) → DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:

config_id (str)
version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

Show registry in PrettyTable

Parameters:

filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show

class dsgrid.registry.dimension_mapping_registry_manager.DimensionMappingRegistryManager(path, params)[source]

Manages registered dimension mappings.

get_by_id(mapping_id, version=None, conn: Connection | None = None) → MappingTableConfig[source]

Get the item matching matching ID. Returns from cache if already loaded.

Parameters:

config_id (str)
version (str) – If None, return the latest version.

Return type:

DSGBaseModel

Raises:

DSGValueNotRegistered – Raised if the ID is not stored.

Show registry in PrettyTable

Parameters:

filters (list or tuple) – List of filter expressions for reigstry content (e.g., filters=[“Submitter==USER”, “Description contains comstock”])
max_width – Max column width in PrettyTable, specify as a single value or as a dict of values by field name
drop_fields – List of field names not to show

Example — Inspect project dimensions¶

from dsgrid.dimension.base_models import DimensionType

project = manager.project_manager.load_project("dsgrid_conus_2022")
geo_dim = project.config.get_base_dimension(DimensionType.GEOGRAPHY)
geo_dim.get_records_dataframe().show()
print(geo_dim.get_unique_ids())

# Show the records for a supplemental dimension.
project.config.get_dimension_records("commercial_end_uses").show()

Projects and Datasets¶

class dsgrid.project.Project(config: ProjectConfig, version: str, dataset_configs, dimension_mgr: DimensionRegistryManager, dimension_mapping_mgr: DimensionMappingRegistryManager, dataset_mgr: DatasetRegistryManager)[source]¶

Interface to a dsgrid project.

property config: ProjectConfig¶: Returns the ProjectConfig.

property version¶

Return the version of the project.

Return type:: str

is_registered(dataset_id)[source]¶

Provides the status of dataset_id within this project.

Parameters:

dataset_id (str)

Returns:

bool – True if dataset_id is in this project’s config and the dataset has been registered with (successfully submitted to) this project; False if dataset_id is in this project’s config but the dataset is not yet available.
Throws
——
DSGValueNotRegistered – If dataset_id is not in this project’s config.

get_dataset(dataset_id, conn: Connection | None = None) → Dataset[source]¶

Returns a Dataset. Calls load_dataset if it hasn’t already been loaded.

Parameters:: dataset_id (str)
Return type:: Dataset

load_dataset(dataset_id, conn: Connection | None = None) → Dataset[source]¶

Loads a dataset.

Parameters:: dataset_id (str)
Return type:: Dataset

class dsgrid.config.project_config.ProjectConfig(model: ProjectConfigModel)[source]¶

Provides an interface to a ProjectConfigModel.

get_base_dimension(dimension_type: DimensionType, dimension_name: str | None = None) → DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]¶: Return the base dimension matching dimension_type. If there is more than one base dimension of the given type, dimension_name is required.

See also

list_base_dimensions

get_base_time_dimension() → TimeDimensionBaseConfig[source]¶: Return the base dimension for time.

get_dimension(name: str) → DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles[source]¶: Return the dimension with name.

get_dimension_records(name: str) → DataFrame[source]¶: Return a DataFrame containing the records for a dimension.

get_dimension_record_ids(name: str) → set[str][source]¶: Return the record IDs for the dimension identified by name.

list_base_dimensions(dimension_type: DimensionType | None = None) → list[DimensionBaseConfigWithFiles | DimensionBaseConfigWithoutFiles][source]¶: Return all base dimensions, optionally filtering to the dimension_type.

See also

get_base_dimension

list_supplemental_dimensions(dimension_type: DimensionType, sort_by=None) → list[DimensionBaseConfigWithFiles][source]¶

Return the supplemental dimensions matching dimension (if any).

Parameters:

dimension_type (DimensionType)
sort_by (str | None) – If set, sort the dimensions by this dimension attribute.

get_base_to_supplemental_dimension_mappings_by_types(dimension_type: DimensionType) → list[MappingTableConfig][source]¶: Return the base-to-supplemental dimension mappings for the dimension (if any).

get_base_to_supplemental_mapping_records(base_dim: DimensionBaseConfigWithFiles, supp_dim: DimensionBaseConfigWithFiles) → DataFrame[source]¶: Return the project’s base-to-supplemental dimension mapping records. Excludes rows with NULL to_id values.

list_dimension_names(category: DimensionCategory | None = None) → list[str][source]¶

Return query names for all dimensions in the project.

Parameters:: category (DimensionCategory | None) – Optionally, filter return by category.

list_registered_dataset_ids() → list[str][source]¶: List registered datasets associated with the project.

list_unregistered_dataset_ids() → list[str][source]¶: List unregistered datasets associated with project registry.

class dsgrid.config.dataset_config.DatasetConfig(model)[source]¶

Provides an interface to a DatasetConfigModel.

classmethod load_from_user_path(config_file: Path, data_base_dir: Path | None = None, associations_base_dir: Path | None = None) → DatasetConfig[source]¶

Load a dataset config from a user-provided config file.

The config file must contain a UserDataLayout with file paths. This method validates that all required files exist.

Parameters:

config_file (Path) – Path to the dataset configuration file.
data_base_dir (Path | None, optional) – Base directory for data files. If set and data file paths are relative, prepend them with this path instead of using the config file’s parent directory.
associations_base_dir (Path | None, optional) – Base directory for missing/expected associations files. If set and paths are relative, prepend them with this path instead of using the config file’s parent directory.

Return type:

DatasetConfig

Raises:

DSGInvalidParameter – If the config doesn’t have a UserDataLayout or required files don’t exist.

class dsgrid.config.dimension_config.DimensionConfig(*args, **kwargs)[source]¶

Provides an interface to a DimensionModel.

get_records_dataframe() → DataFrame¶: Return the records in a spark dataframe. Cached on first call.

get_unique_ids() → set[str]¶

Return the unique IDs in a dimension’s records.

Returns:: set of str
Return type:: set

class dsgrid.dimension.base_models.DimensionType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Dimension types

METRIC = 'metric'¶

GEOGRAPHY = 'geography'¶

SECTOR = 'sector'¶

SUBSECTOR = 'subsector'¶

TIME = 'time'¶

WEATHER_YEAR = 'weather_year'¶

MODEL_YEAR = 'model_year'¶

SCENARIO = 'scenario'¶

classmethod from_column(column: str) → DimensionType[source]¶

static get_dimension_types_allowed_as_columns() → set[DimensionType][source]¶: Return the dimension types that may exist in the data table as columns.

static get_allowed_dimension_column_names() → set[str][source]¶: Return the dimension types that may exist in the data table as columns.

Config Generation¶

These helper functions create dataset and project configuration dictionaries programmatically, bypassing Pydantic validation. They are useful when building configs in scripts or tests where file-path resolution is not needed.

dsgrid.config.dataset_config.make_unvalidated_dataset_config(dataset_id, metric_type: str, pivoted_dimension_type: DimensionType | None = None, data_classification='low', dataset_type=InputDatasetType.UNSPECIFIED, included_dimensions: list[DimensionType] | None = None, time_type: TimeDimensionType | None = None, use_project_geography_time_zone: bool = False, dimension_references: list[DimensionReferenceModel] | None = None, trivial_dimensions: list[DimensionType] | None = None, slim: bool = True, metadata: dict[str, Any] | None = None) → dict[str, Any][source]¶: Create a dataset config as a dictionary, skipping validation.

dsgrid.config.project_config.make_unvalidated_project_config(project_id: str, dataset_ids: Iterable[str], metric_types: Iterable[str], name: str | None = None, description: str | None = None, time_type: TimeDimensionType = TimeDimensionType.DATETIME) → dict[str, Any][source]¶: Create a project config as a dictionary, skipping validation.

Example — Build a dataset config programmatically¶

import json5
from pathlib import Path
from dsgrid.config.dataset_config import make_unvalidated_dataset_config

config_dict = make_unvalidated_dataset_config(
    dataset_id="my_dataset",
    metric_type="energy",
)

# Write to a JSON5 file for registration.
Path("my_dataset/dataset.json5").write_text(json5.dumps(config_dict, indent=2))

Example — Build a project config programmatically¶

import json5
from pathlib import Path
from dsgrid.config.project_config import make_unvalidated_project_config

config_dict = make_unvalidated_project_config(
    project_id="my_project",
    dataset_ids=["dataset_a", "dataset_b"],
    metric_types=["energy"],
    name="My Project",
    description="An example project.",
)

Path("my_project/project.json5").write_text(json5.dumps(config_dict, indent=2))

Registration¶

Registering dimensions¶

Dimensions are usually registered automatically as part of dataset or project registration (via inline dimensions in the config). If you need to register dimensions independently — for example, to share a dimension across multiple datasets before registering any of them — use this method.

DimensionRegistryManager.register(config_file: Path, submitter: str, log_message: str) → list[str][source]¶

Registers a config file in the registry.

Raises:

ValueError – Raised if the config_file is invalid.
DSGDuplicateValueRegistered – Raised if the config ID is already registered.

Registering dimension mappings¶

Dimension mappings are usually registered as part of submit_dataset. If you need to register mappings independently — for example, to reuse a mapping across multiple dataset submissions, or to use with a standalone dataset query — use this method.

DimensionMappingRegistryManager.register(config_file, submitter, log_message) → list[str][source]¶

Registers a config file in the registry.

Raises:

ValueError – Raised if the config_file is invalid.
DSGDuplicateValueRegistered – Raised if the config ID is already registered.

Registering a dataset¶

DatasetRegistryManager.register(config_file: Path, submitter: str | None = None, log_message: str | None = None, context: RegistrationContext | None = None, data_base_dir: Path | None = None, associations_base_dir: Path | None = None, requirements: DatasetDimensionRequirements | None = None)[source]¶

Registers a config file in the registry.

Raises:

ValueError – Raised if the config_file is invalid.
DSGDuplicateValueRegistered – Raised if the config ID is already registered.

Registering a project¶

ProjectRegistryManager.register(config_file: Path, submitter: str, log_message: str) → None[source]¶: Register a project from a config file.

Submitting a dataset to a project¶

ProjectRegistryManager.submit_dataset(**kwargs)¶

ProjectRegistryManager.register_and_submit_dataset(**kwargs)¶

Example — Register and submit a dataset¶

from pathlib import Path

# Register the dataset.
manager.dataset_manager.register(
    config_file=Path("my_dataset/dataset.json5"),
    submitter="Jane Doe",
    log_message="Initial registration of my_dataset",
)

# Submit it to a project with dimension mappings.
manager.project_manager.submit_dataset(
    project_id="dsgrid_conus_2022",
    dataset_id="my_dataset",
    submitter="Jane Doe",
    log_message="Submit my_dataset to dsgrid_conus_2022",
    dimension_mapping_file=Path("my_dataset/dimension_mappings.json5"),
)

Queries¶

Project query data models¶

pydantic model dsgrid.query.models.ProjectQueryModel[source]¶

Represents a user query on a Project.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

project (dsgrid.query.models.ProjectQueryParamsModel)

serialize_cached_content() → dict[str, Any][source]¶: Return a JSON-able representation of the model that can be used for caching purposes.

pydantic model dsgrid.query.models.ProjectQueryParamsModel[source]¶

Defines how to transform a project into a CompositeDataset

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

project_id (str)
dataset (dsgrid.query.models.DatasetModel)
excluded_dataset_ids (list[str])
include_dsgrid_dataset_components (bool)
version (str | None)
mapping_plans (list[dsgrid.query.dataset_mapping_plan.DatasetMappingPlan])
spark_conf_per_dataset (list[dsgrid.query.models.SparkConfByDataset])

Validators:

check_unsupported_fields » all fields
check_invalid_dataset_ids » all fields
check_duplicate_dataset_ids » mapping_plans
check_duplicate_dataset_ids » spark_conf_per_dataset

get_dataset_mapping_plan(dataset_id: str) → DatasetMappingPlan | None[source]¶: Return the mapping plan for this dataset_id or None if the user did not specify one.

get_spark_conf(dataset_id: str) → dict[str, Any][source]¶: Return the Spark settings to apply while processing dataset_id.

pydantic model dsgrid.query.models.QueryResultParamsModel[source]¶

Controls post-processing and storage of CompositeDatasets

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

replace_ids_with_names (bool)
aggregations (list[dsgrid.query.models.AggregationModel])
aggregate_each_dataset (bool)
reports (list[dsgrid.query.models.ReportInputModel])
column_type (dsgrid.query.models.ColumnType)
table_format (dsgrid.dataset.models.PivotedTableFormatModel | dsgrid.dataset.models.StackedTableFormatModel)
output_format (str)
sort_columns (list[str])
dimension_filters (list[dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel | dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel | dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel | dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel | dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel | dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel])
time_zone (str | Literal['geography'] | None)

Validators:

check_pivot_dimension_type » all fields
check_format » output_format
check_column_type » all fields

pydantic model dsgrid.query.models.DatasetModel[source]¶

Specifies the datasets to use in a project query.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_id (str)
source_datasets (list[dsgrid.query.models.StandaloneDatasetModel | dsgrid.query.models.ProjectionDatasetModel])
expression (str | None)
params (dsgrid.query.models.ProjectQueryDatasetParamsModel)

Validators:

handle_expression » expression

pydantic model dsgrid.query.models.StandaloneDatasetModel[source]¶

A dataset with energy use data.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_type (Literal[dsgrid.query.models.DatasetType.STANDALONE])
dataset_id (str)

get_dataset_id() → str[source]¶

Return the primary dataset ID.

Return type:: str

list_source_dataset_ids() → list[str][source]¶: Return a list of all source dataset IDs.

pydantic model dsgrid.query.models.ProjectionDatasetModel[source]¶

A dataset with growth rates that can be applied to a standalone dataset.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_type (Literal[dsgrid.query.models.DatasetType.PROJECTION])
dataset_id (str)
initial_value_dataset_id (str)
growth_rate_dataset_id (str)
construction_method (dsgrid.query.models.DatasetConstructionMethod)
base_year (int | None)

get_dataset_id() → str[source]¶

Return the primary dataset ID.

Return type:: str

list_source_dataset_ids() → list[str][source]¶: Return a list of all source dataset IDs.

pydantic model dsgrid.query.models.AggregationModel[source]¶

Aggregate on one or more dimensions.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

aggregation_function (Any)
dimensions (dsgrid.query.models.DimensionNamesModel)

Validators:

check_aggregation_function » aggregation_function
check_for_metric » dimensions

iter_dimensions_to_keep() → Generator[tuple[DimensionType, ColumnModel], None, None][source]¶: Yield the dimension type and ColumnModel for each dimension to keep.

list_dropped_dimensions() → list[DimensionType][source]¶: Return a list of dimension types that will be dropped by the aggregation.

pydantic model dsgrid.query.models.DimensionNamesModel[source]¶

Defines the list of dimensions to which the value columns should be aggregated. If a value is empty, that dimension will be aggregated and dropped from the table.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

geography (list[str | dsgrid.query.models.ColumnModel])
metric (list[str | dsgrid.query.models.ColumnModel])
model_year (list[str | dsgrid.query.models.ColumnModel])
scenario (list[str | dsgrid.query.models.ColumnModel])
sector (list[str | dsgrid.query.models.ColumnModel])
subsector (list[str | dsgrid.query.models.ColumnModel])
time (list[str | dsgrid.query.models.ColumnModel])
weather_year (list[str | dsgrid.query.models.ColumnModel])

Validators:

fix_columns » all fields

pydantic model dsgrid.query.models.ColumnModel[source]¶

Defines one column in a SQL aggregation statement.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dimension_name (str)
function (Any)
alias (str | None)

Validators:

handle_function » function
handle_alias » alias

class dsgrid.query.models.ColumnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: Defines what the columns of a dataset table represent.

class dsgrid.query.models.DatasetType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: Defines the type of a dataset in a query.

Dataset query data models¶

A dataset query remaps a registered dataset’s dimensions without involving a project. This is useful when you want to map a dataset to alternate dimensions (e.g., county → state) as a standalone operation. The required dimension mappings must already be registered in the registry.

pydantic model dsgrid.query.models.DatasetQueryModel[source]¶

Defines how to transform a dataset

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_id (str)
to_dimension_references (list[dsgrid.config.dimensions.DimensionReferenceModel])
mapping_plan (dsgrid.query.dataset_mapping_plan.DatasetMappingPlan | None)
time_based_data_adjustment (dsgrid.dimension.time.TimeBasedDataAdjustmentModel)
wrap_time_allowed (bool)
result (dsgrid.query.models.QueryResultParamsModel)

dsgrid.query.models.make_dataset_query(name: str, dataset_id: str, to_dimension_references: list[DimensionReferenceModel], plan: DatasetMappingPlan | None = None) → DatasetQueryModel[source]¶

Create a query to map a dataset to alternate dimensions.

Parameters:

dataset_id (str)
plan (DatasetMappingPlan | None) – Optional plan to control the mapping operation.

Query submission¶

class dsgrid.query.query_submitter.ProjectQuerySubmitter(project: Project, *args, **kwargs)[source]¶

Submits queries for a project.

submit(**kwargs)¶: Submit a query for execution

class dsgrid.query.query_submitter.DatasetQuerySubmitter(output_dir: Path)[source]¶

Submits queries for a project.

submit(**kwargs)¶: Submit a query for execution

class dsgrid.query.query_submitter.CompositeDatasetQuerySubmitter(project: Project, *args, **kwargs)[source]¶

Submits queries for a composite dataset.

submit(**kwargs)¶: Submit a query for execution

Example — Submit a project query¶

from dsgrid.query.models import (
    AggregationModel,
    DatasetModel,
    DimensionNamesModel,
    ProjectQueryParamsModel,
    ProjectQueryModel,
    QueryResultParamsModel,
    StandaloneDatasetModel,
)
from dsgrid.query.query_submitter import ProjectQuerySubmitter

project = manager.project_manager.load_project("dsgrid_conus_2022")
submitter = ProjectQuerySubmitter(project, output_dir=Path("query_output"))

query = ProjectQueryModel(
    name="Total Electricity Use By State and Sector",
    project=ProjectQueryParamsModel(
        project_id="dsgrid_conus_2022",
        dataset=DatasetModel(
            dataset_id="electricity_use",
            source_datasets=[
                StandaloneDatasetModel(dataset_id="comstock_conus_2022_projected"),
                StandaloneDatasetModel(dataset_id="resstock_conus_2022_projected"),
                StandaloneDatasetModel(dataset_id="tempo_conus_2022_mapped"),
            ],
        ),
    ),
    result=QueryResultParamsModel(
        aggregations=[
            AggregationModel(
                dimensions=DimensionNamesModel(
                    geography=["state"],
                    metric=["electricity_collapsed"],
                    model_year=[],
                    scenario=[],
                    sector=["sector"],
                    subsector=[],
                    time=[],
                    weather_year=[],
                ),
                aggregation_function="sum",
            ),
        ],
    ),
)

df = submitter.submit(query)
df.show()

Example — Map a dataset to alternate dimensions¶

from pathlib import Path
from dsgrid.config.dimensions import DimensionReferenceModel
from dsgrid.query.models import make_dataset_query
from dsgrid.query.query_submitter import DatasetQuerySubmitter

# Identify the target dimension (must already be registered).
to_dim_ref = DimensionReferenceModel(
    dimension_type="geography",
    dimension_id="<state-dimension-uuid>",
    version="1.0.0",
)

query = make_dataset_query(
    name="my_dataset_remapped_to_state",
    dataset_id="my_dataset",
    to_dimension_references=[to_dim_ref],
)

submitter = DatasetQuerySubmitter(output_dir=Path("dataset_query_output"))
df = submitter.submit(query, mgr=manager)
df.show()