get_dataset_mapping_plan(dataset_id: str) → DatasetMappingPlan | None[source]¶: Return the mapping plan for this dataset_id or None if the user did not specify one.

get_spark_conf(dataset_id: str) → dict[str, Any][source]¶: Return the Spark settings to apply while processing dataset_id.

set_dataset_mapper(new_mapper: DatasetMappingPlan) → None[source]¶

pydantic model dsgrid.query.models.QueryResultParamsModel[source]¶

Controls post-processing and storage of CompositeDatasets

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

aggregate_each_dataset (bool)
aggregations (list[dsgrid.query.models.AggregationModel])
column_type (dsgrid.query.models.ColumnType)
dimension_filters (list[dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel | dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel | dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel | dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel | dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel | dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel])
output_format (str)
replace_ids_with_names (bool)
reports (list[dsgrid.query.models.ReportInputModel])
sort_columns (list[str])
table_format (dsgrid.dataset.models.PivotedTableFormatModel | dsgrid.dataset.models.UnpivotedTableFormatModel)
time_zone (str | None)

Validators:

check_column_type » all fields
check_format » output_format
check_pivot_dimension_type » all fields

field aggregate_each_dataset: bool = False¶

If True, aggregate each dataset before applying the expression to create one overall dataset. This parameter must be set to True for queries that will be adding or subtracting datasets with different dimensionality. Defaults to False, which corresponds to the default behavior of performing one aggregation on the overall dataset. WARNING: For a standard query that performs a union of datasets, setting this value to True could produce rows with duplicate dimension combinations, especially if one or more dimensions are also dropped.

Validated by:

check_column_type
check_pivot_dimension_type

field aggregations: list[AggregationModel] = []¶

Defines how to aggregate dimensions

Validated by:

check_column_type
check_pivot_dimension_type

field column_type: ColumnType = ColumnType.DIMENSION_NAMES¶

Whether to make the result table columns dimension types. Default behavior is to use dimension names. In order to register a result table as a derived dataset, this must be set to dimension_types.

Validated by:

check_column_type
check_pivot_dimension_type

field dimension_filters: list[Annotated[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel, FieldInfo(annotation=NoneType, required=True, discriminator='filter_type')]] = []¶

Filters to apply to the result. Must contain columns in the result.

Validated by:

check_column_type
check_pivot_dimension_type

field output_format: str = 'parquet'¶

Output file format: csv or parquet

Validated by:

check_column_type
check_format
check_pivot_dimension_type

field replace_ids_with_names: bool = False¶

Replace dimension record IDs with their names in result tables.

Validated by:

check_column_type
check_pivot_dimension_type

field reports: list[ReportInputModel] = []¶

Run these pre-defined reports on the result.

Validated by:

check_column_type
check_pivot_dimension_type

field sort_columns: list[str] = []¶

Sort the results by these dimension names.

Validated by:

check_column_type
check_pivot_dimension_type

field table_format: Annotated[PivotedTableFormatModel | UnpivotedTableFormatModel, FieldInfo(annotation=NoneType, required=True, title='table_format', description='Defines the format of the value columns of the result table.', discriminator='format_type')] = UnpivotedTableFormatModel(format_type=<TableFormatType.UNPIVOTED: 'unpivoted'>)¶

Defines the format of the value columns of the result table.

Validated by:

check_column_type
check_pivot_dimension_type

field time_zone: str | None = None¶

Convert the results to this time zone.

Validated by:

check_column_type
check_pivot_dimension_type

validator check_column_type » all fields[source]¶

validator check_format » output_format[source]¶

validator check_pivot_dimension_type » all fields[source]¶

pydantic model dsgrid.query.models.DatasetModel[source]¶

Specifies the datasets to use in a project query.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_id (str)
expression (str | None)
params (dsgrid.query.models.ProjectQueryDatasetParamsModel)
source_datasets (list[dsgrid.query.models.StandaloneDatasetModel | dsgrid.query.models.ProjectionDatasetModel])

Validators:

handle_expression » expression

field dataset_id: str [Required]¶: Identifier for the resulting dataset

field expression: str | None = None¶

Expression to combine datasets. Default is to take a union of all datasets.

Validated by:

handle_expression

field params: ProjectQueryDatasetParamsModel = ProjectQueryDatasetParamsModel(dimension_filters=[])¶: Parameters affecting datasets. Used for caching intermediate tables.

field source_datasets: list[Annotated[StandaloneDatasetModel | ProjectionDatasetModel, FieldInfo(annotation=NoneType, required=True, discriminator='dataset_type')]] [Required]¶: Datasets from which to read. Each must be of type DatasetBaseModel.

validator handle_expression » expression[source]¶

pydantic model dsgrid.query.models.StandaloneDatasetModel[source]¶

A dataset with energy use data.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

dataset_id (str)
dataset_type (Literal[dsgrid.query.models.DatasetType.STANDALONE])

field dataset_id: str [Required]¶: Dataset identifier

field dataset_type: Literal[DatasetType.STANDALONE] = DatasetType.STANDALONE¶

get_dataset_id() → str[source]¶

Return the primary dataset ID.

Return type:: str

list_source_dataset_ids() → list[str][source]¶: Return a list of all source dataset IDs.

pydantic model dsgrid.query.models.ProjectionDatasetModel[source]¶

A dataset with growth rates that can be applied to a standalone dataset.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

base_year (int | None)
construction_method (dsgrid.query.models.DatasetConstructionMethod)
dataset_id (str)
dataset_type (Literal[dsgrid.query.models.DatasetType.PROJECTION])
growth_rate_dataset_id (str)
initial_value_dataset_id (str)

field base_year: int | None = None¶: Base year of the dataset to use in growth rate application. Must be a year defined in the principal dataset’s model year dimension. If None, there must be only one model year in that dimension and it will be used.

field construction_method: DatasetConstructionMethod = DatasetConstructionMethod.EXPONENTIAL_GROWTH¶: Specifier for the code that applies the growth rate to the principal dataset

field dataset_id: str [Required]¶: Identifier for the resulting dataset

field dataset_type: Literal[DatasetType.PROJECTION] = DatasetType.PROJECTION¶

field growth_rate_dataset_id: str [Required]¶: Growth rate dataset identifier to apply to the principal dataset

field initial_value_dataset_id: str [Required]¶: Principal dataset identifier

get_dataset_id() → str[source]¶

Return the primary dataset ID.

Return type:: str

list_source_dataset_ids() → list[str][source]¶: Return a list of all source dataset IDs.

pydantic model dsgrid.query.models.AggregationModel[source]¶

Aggregate on one or more dimensions.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

aggregation_function (Any)
dimensions (dsgrid.query.models.DimensionNamesModel)

Validators:

check_aggregation_function » aggregation_function
check_for_metric » dimensions

field aggregation_function: Any = None¶

Must be a function name in pyspark.sql.functions

Validated by:

check_aggregation_function

field dimensions: DimensionNamesModel [Required]¶

Dimensions on which to aggregate

Validated by:

check_for_metric

validator check_aggregation_function » aggregation_function[source]¶

validator check_for_metric » dimensions[source]¶

iter_dimensions_to_keep() → Generator[tuple[DimensionType, ColumnModel], None, None][source]¶: Yield the dimension type and ColumnModel for each dimension to keep.

list_dropped_dimensions() → list[DimensionType][source]¶: Return a list of dimension types that will be dropped by the aggregation.

serialize_aggregation_function(function, _)[source]¶

pydantic model dsgrid.query.models.ColumnModel[source]¶

Defines one column in a SQL aggregation statement.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

alias (str | None)
dimension_name (str)
function (Any)

Validators:

handle_alias » alias
handle_function » function

field alias: str | None = None¶

Name of the resulting column.

Validated by:

handle_alias

field dimension_name: str [Required]¶

field function: Any = None¶

Function or name of function in pyspark.sql.functions.

Validated by:

handle_function

get_column_name()[source]¶

validator handle_alias » alias[source]¶

validator handle_function » function[source]¶

serialize_function(function, _)[source]¶

pydantic model dsgrid.query.models.FilteredDatasetModel[source]¶

Filters to apply to a dataset

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

$digraph "Entity Relationship Diagram created by erdantic" { graph [fontcolor=gray66, fontname="Times New Roman,Times,Liberation Serif,serif", fontsize=9, nodesep=0.5, rankdir=LR, ranksep=1.5 ]; node [fontname="Times New Roman,Times,Liberation Serif,serif", fontsize=14, label="\N", shape=plain ]; edge [dir=both]; "dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionFilterBetweenColumnOperatorModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_name</td><td port="dimension_name">str</td></tr><tr><td>lower_bound</td><td port="lower_bound">Any</td></tr><tr><td>upper_bound</td><td port="upper_bound">Any</td></tr><tr><td>negate</td><td port="negate">bool</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.BETWEEN_COLUMN_OPERATOR]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel

Filters a table where a dimension column is \ between the lower bound and upper bound,
inclusive.

Examples:
import pyspark.sql.functions as F
df.filter(F.col(\"\ timestamp\").between(\"2012-07-01 00:00:00\", \"2012-08-01 00:00:00\"))
"]; "dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionFilterColumnOperatorModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_name</td><td port="dimension_name">str</td></tr><tr><td>operator</td><td port="operator">str</td></tr><tr><td>value</td><td port="value">Any</td></tr><tr><td>negate</td><td port="negate">bool</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.COLUMN_OPERATOR]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel

Filters a table where a dimension column matches \ a Spark SQL operator.

Examples:
import pyspark.sql.functions as F
df.filter(F.col(\"geography\").like(\"abc%\"))
\ df.filter(~F.col(\"sector\").startswith(\"com\"))
"]; "dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionFilterExpressionModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_name</td><td port="dimension_name">str</td></tr><tr><td>operator</td><td port="operator">str</td></tr><tr><td>value</td><td port="value">Union[str, int, float]</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.EXPRESSION]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel

Filters a table where a dimension column matches an expression.&#\ xA;
Example:
 DimensionFilterExpressionModel(
 dimension_type=DimensionType.GEOGRAPHY,
 dimension_\ name=\"county\",
 operator=\"==\",
 value=\"06037\",
 ),
is equivalent to
 df.filter(\"county == '\ 06037'\")
"]; "dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DimensionFilterExpressionRawModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_name</td><td port="dimension_name">str</td></tr><tr><td>value</td><td port="value">Union[str, int, float]</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.EXPRESSION_RAW]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel

Filters a table where a dimension column matches an \ expression.
Uses the passed string with no modification.

Example:
 DimensionFilterExpressionRawModel(
 \ dimension_type=DimensionType.GEOGRAPHY,
 dimension_name=\"county\",
 value=\"== '06037'\",
 ),
is equivalent \ to
 df.filter(\"county == '06037'\")

The difference between this class and DimensionFilterExpressionModel is that \ the latter
will attempt to add quotes as necessary.
"]; "dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>SubsetDimensionFilterModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_names</td><td port="dimension_names">list[str]</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.SUBSET]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel

Filters base dimension records that match a subset dimension.&#\ xA;"]; "dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>SupplementalDimensionFilterColumnOperatorModel</b></td></tr><tr><td>dimension_type</td><td port="dimension_type">DimensionType</td></tr><tr><td>column</td><td port="column">str</td></tr><tr><td>dimension_name</td><td port="dimension_name">str</td></tr><tr><td>value</td><td port="value">Any</td></tr><tr><td>operator</td><td port="operator">str</td></tr><tr><td>negate</td><td port="negate">bool</td></tr><tr><td>filter_type</td><td port="filter_type">Literal[DimensionFilterType.SUPPLEMENTAL_COLUMN_OPERATOR]</td></tr></table>>, tooltip="dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel

Filters base dimension records that have \ a valid mapping to a supplemental dimension.
"]; "dsgrid.query.models.FilteredDatasetModel" [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>FilteredDatasetModel</b></td></tr><tr><td>dataset_id</td><td port="dataset_id">str</td></tr><tr><td>filters</td><td port="filters">list[Union[DimensionFilterExpressionModel, DimensionFilterExpressionRawModel, DimensionFilterColumnOperatorModel, DimensionFilterBetweenColumnOperatorModel, SubsetDimensionFilterModel, SupplementalDimensionFilterColumnOperatorModel]]</td></tr></table>>, tooltip="dsgrid.query.models.FilteredDatasetModel

Filters to apply to a dataset
"]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; "dsgrid.query.models.FilteredDatasetModel":filters:e -> "dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel":_root:w [arrowhead=crownone, arrowtail=nonenone]; }$

Fields:

dataset_id (str)
filters (list[dsgrid.dimension.dimension_filters.DimensionFilterExpressionModel | dsgrid.dimension.dimension_filters.DimensionFilterExpressionRawModel | dsgrid.dimension.dimension_filters.DimensionFilterColumnOperatorModel | dsgrid.dimension.dimension_filters.DimensionFilterBetweenColumnOperatorModel | dsgrid.dimension.dimension_filters.SubsetDimensionFilterModel | dsgrid.dimension.dimension_filters.SupplementalDimensionFilterColumnOperatorModel])

field dataset_id: str [Required]¶: Dataset ID

field filters: list[Annotated[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel, FieldInfo(annotation=NoneType, required=True, discriminator='filter_type')]] [Required]¶

pydantic model dsgrid.query.models.ReportInputModel[source]¶

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

inputs (Any)
report_type (dsgrid.query.models.ReportType)

field inputs: Any = None¶

field report_type: ReportType [Required]¶

pydantic model dsgrid.query.models.SparkConfByDataset[source]¶

Defines a custom Spark configuration to use while running a query on a dataset.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:

conf (dict[str, Any])
dataset_id (str)

field conf: dict[str, Any] [Required]¶

field dataset_id: str [Required]¶

class dsgrid.query.models.ColumnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: Defines what the columns of a dataset table represent.

class dsgrid.query.models.DatasetType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: Defines the type of a dataset in a query.

class dsgrid.query.models.ReportType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶: Pre-defined reports

Submission¶

class dsgrid.query.query_submitter.ProjectQuerySubmitter(project: Project, *args, **kwargs)[source]¶

Submits queries for a project.

submit(**kwargs)¶: Submit a query for execution

Examples¶

from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection
from dsgrid.query.models import (
    AggregationModel,
    DatasetModel,
    DimensionQueryNamesModel,
    ProjectQueryParamsModel,
    ProjectQueryModel,
    QueryResultParamsModel,
    StandaloneDatasetModel,
)
from dsgrid.query.query_submitter import ProjectQuerySubmitter

manager = RegistryManager.load(
    DatabaseConnection(
        hostname="dsgrid-registry.hpc.nrel.gov",
        database="standard-scenarios",
    ),
    offline_mode=True
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
query = ProjectQueryModel(
    name="Total Electricity Use By State and Sector",
    project=ProjectQueryParamsModel(
        project_id="dsgrid_conus_2022",
        dataset=DatasetModel(
            dataset_id="electricity_use",
            source_datasets=[
                StandaloneDatasetModel(dataset_id="comstock_conus_2022_projected"),
                StandaloneDatasetModel(dataset_id="resstock_conus_2022_projected"),
                StandaloneDatasetModel(dataset_id="tempo_conus_2022_mapped"),
            ],
        ),
    ),
    result=QueryResultParamsModel(
        aggregations=[
            AggregationModel(
                dimensions=DimensionQueryNamesModel(
                    geography=["state"],
                    metric=["electricity_collapsed"],
                    model_year=[],
                    scenario=[],
                    sector=["sector"],
                    subsector=[],
                    time=[],
                    weather_year=[],
                ),
                aggregation_function="sum",
            ),
        ],
    ),
)