Query¶
Data Models¶
- pydantic model dsgrid.query.models.ProjectQueryModel[source]¶
Represents a user query on a Project.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- field project: ProjectQueryParamsModel [Required]¶
Defines the datasets to use and how to transform them.
- field result: QueryResultParamsModel = QueryResultParamsModel(replace_ids_with_names=False, aggregations=[], aggregate_each_dataset=False, reports=[], column_type=<ColumnType.DIMENSION_QUERY_NAMES: 'dimension_query_names'>, table_format=UnpivotedTableFormatModel(format_type=<TableFormatType.UNPIVOTED: 'unpivoted'>), output_format='parquet', sort_columns=[], dimension_filters=[], time_zone=None)¶
Controls the output results
- pydantic model dsgrid.query.models.ProjectQueryParamsModel[source]¶
Defines how to transform a project into a CompositeDataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
check_unsupported_fields
»all fields
- field dataset: DatasetModel [Required]¶
Definition of the dataset to create.
- Validated by:
- field excluded_dataset_ids: list[str] = []¶
Datasets to exclude from query
- Validated by:
- field include_dsgrid_dataset_components: bool = False¶
- Validated by:
- field project_id: str [Required]¶
Project ID for query
- Validated by:
- field spark_conf_per_dataset: list[SparkConfByDataset] = []¶
Apply these Spark configuration settings while a dataset is being processed.
- Validated by:
- field version: str | None = None¶
Version of project or dataset on which the query is based. Should not be set by the user
- Validated by:
- pydantic model dsgrid.query.models.QueryResultParamsModel[source]¶
Controls post-processing and storage of CompositeDatasets
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
check_column_type
»all fields
check_pivot_dimension_type
»all fields
- field aggregate_each_dataset: bool = False¶
If True, aggregate each dataset before applying the expression to create one overall dataset. This parameter must be set to True for queries that will be adding or subtracting datasets with different dimensionality. Defaults to False, which corresponds to the default behavior of performing one aggregation on the overall dataset. WARNING: For a standard query that performs a union of datasets, setting this value to True could produce rows with duplicate dimension combinations, especially if one or more dimensions are also dropped.
- Validated by:
- field aggregations: list[AggregationModel] = []¶
Defines how to aggregate dimensions
- Validated by:
- field column_type: ColumnType = ColumnType.DIMENSION_QUERY_NAMES¶
Whether to make the result table columns dimension types. Default behavior is to use dimension query names. In order to register a result table as a derived dataset, this must be set to dimension_types.
- Validated by:
- field dimension_filters: list[Annotated[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel, FieldInfo(annotation=NoneType, required=True, discriminator='filter_type')]] = []¶
Filters to apply to the result. Must contain columns in the result.
- Validated by:
- field output_format: str = 'parquet'¶
Output file format: csv or parquet
- Validated by:
- field replace_ids_with_names: bool = False¶
Replace dimension record IDs with their names in result tables.
- Validated by:
- field reports: list[ReportInputModel] = []¶
Run these pre-defined reports on the result.
- Validated by:
- field sort_columns: list[str] = []¶
Sort the results by these dimension query names.
- Validated by:
- field table_format: Annotated[PivotedTableFormatModel | UnpivotedTableFormatModel, FieldInfo(annotation=NoneType, required=True, title='table_format', description='Defines the format of the value columns of the result table.', discriminator='format_type')] = UnpivotedTableFormatModel(format_type=<TableFormatType.UNPIVOTED: 'unpivoted'>)¶
Defines the format of the value columns of the result table.
- Validated by:
- field time_zone: str | None = None¶
Convert the results to this time zone.
- Validated by:
- validator check_format » output_format[source]¶
- pydantic model dsgrid.query.models.DatasetModel[source]¶
Specifies the datasets to use in a project query.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
- field dataset_id: str [Required]¶
Identifier for the resulting dataset
- field expression: str | None = None¶
Expression to combine datasets. Default is to take a union of all datasets.
- Validated by:
- field params: ProjectQueryDatasetParamsModel = ProjectQueryDatasetParamsModel(dimension_filters=[])¶
Parameters affecting datasets. Used for caching intermediate tables.
- field source_datasets: list[Annotated[StandaloneDatasetModel | ProjectionDatasetModel, FieldInfo(annotation=NoneType, required=True, discriminator='dataset_type')]] [Required]¶
Datasets from which to read. Each must be of type DatasetBaseModel.
- validator handle_expression » expression[source]¶
- pydantic model dsgrid.query.models.StandaloneDatasetModel[source]¶
A dataset with energy use data.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field dataset_id: str [Required]¶
Dataset identifier
- field dataset_type: Literal[DatasetType.STANDALONE] = DatasetType.STANDALONE¶
- pydantic model dsgrid.query.models.ProjectionDatasetModel[source]¶
A dataset with growth rates that can be applied to a standalone dataset.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- field base_year: int | None = None¶
Base year of the dataset to use in growth rate application. Must be a year defined in the principal dataset’s model year dimension. If None, there must be only one model year in that dimension and it will be used.
- field construction_method: DatasetConstructionMethod = DatasetConstructionMethod.EXPONENTIAL_GROWTH¶
Specifier for the code that applies the growth rate to the principal dataset
- field dataset_id: str [Required]¶
Identifier for the resulting dataset
- field dataset_type: Literal[DatasetType.PROJECTION] = DatasetType.PROJECTION¶
- field growth_rate_dataset_id: str [Required]¶
Growth rate dataset identifier to apply to the principal dataset
- field initial_value_dataset_id: str [Required]¶
Principal dataset identifier
- pydantic model dsgrid.query.models.AggregationModel[source]¶
Aggregate on one or more dimensions.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
- field aggregation_function: Any = None¶
Must be a function name in pyspark.sql.functions
- Validated by:
- field dimensions: DimensionQueryNamesModel [Required]¶
Dimensions on which to aggregate
- Validated by:
- validator check_aggregation_function » aggregation_function[source]¶
- validator check_for_metric » dimensions[source]¶
- iter_dimensions_to_keep()[source]¶
Yield the dimension type and ColumnModel for each dimension to keep.
- pydantic model dsgrid.query.models.ColumnModel[source]¶
Defines one column in a SQL aggregation statement.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
- field alias: str | None = None¶
Name of the resulting column.
- Validated by:
- field dimension_query_name: str [Required]¶
- field function: Any = None¶
Function or name of function in pyspark.sql.functions.
- Validated by:
- pydantic model dsgrid.query.models.DimensionQueryNamesModel[source]¶
Defines the list of dimensions to which the value columns should be aggregated. If a value is empty, that dimension will be aggregated and dropped from the table.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- Validators:
fix_columns
»all fields
- field geography: list[str | ColumnModel] [Required]¶
- Validated by:
- field metric: list[str | ColumnModel] [Required]¶
- Validated by:
- field model_year: list[str | ColumnModel] [Required]¶
- Validated by:
- field scenario: list[str | ColumnModel] [Required]¶
- Validated by:
- field sector: list[str | ColumnModel] [Required]¶
- Validated by:
- field subsector: list[str | ColumnModel] [Required]¶
- Validated by:
- field time: list[str | ColumnModel] [Required]¶
- Validated by:
- field weather_year: list[str | ColumnModel] [Required]¶
- Validated by:
- pydantic model dsgrid.query.models.FilteredDatasetModel[source]¶
Filters to apply to a dataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Fields:
- field dataset_id: str [Required]¶
Dataset ID
- field filters: list[Annotated[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel, FieldInfo(annotation=NoneType, required=True, discriminator='filter_type')]] [Required]¶
- pydantic model dsgrid.query.models.ReportInputModel[source]¶
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field inputs: Any = None¶
- field report_type: ReportType [Required]¶
- pydantic model dsgrid.query.models.SparkConfByDataset[source]¶
Defines a custom Spark configuration to use while running a query on a dataset.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field conf: dict[str, Any] [Required]¶
- field dataset_id: str [Required]¶
- class dsgrid.query.models.ColumnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Defines what the columns of a dataset table represent.
Submission¶
Examples¶
from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection
from dsgrid.query.models import (
AggregationModel,
DatasetModel,
DimensionQueryNamesModel,
ProjectQueryParamsModel,
ProjectQueryModel,
QueryResultParamsModel,
StandaloneDatasetModel,
)
from dsgrid.query.query_submitter import ProjectQuerySubmitter
manager = RegistryManager.load(
DatabaseConnection(
hostname="dsgrid-registry.hpc.nrel.gov",
database="standard-scenarios",
),
offline_mode=True
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
query = ProjectQueryModel(
name="Total Electricity Use By State and Sector",
project=ProjectQueryParamsModel(
project_id="dsgrid_conus_2022",
dataset=DatasetModel(
dataset_id="electricity_use",
source_datasets=[
StandaloneDatasetModel(dataset_id="comstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="resstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="tempo_conus_2022_mapped"),
],
),
),
result=QueryResultParamsModel(
aggregations=[
AggregationModel(
dimensions=DimensionQueryNamesModel(
geography=["state"],
metric=["electricity_collapsed"],
model_year=[],
scenario=[],
sector=["sector"],
subsector=[],
time=[],
weather_year=[],
),
aggregation_function="sum",
),
],
),
)