Query¶
Data Models¶
- pydantic model dsgrid.query.models.ProjectQueryModel¶
Represents a user query on a Project.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- field project: ProjectQueryParamsModel [Required]¶
Defines the datasets to use and how to transform them.
- field result: QueryResultParamsModel = QueryResultParamsModel(replace_ids_with_names=False, aggregations=[], reports=[], column_type=<ColumnType.DIMENSION_QUERY_NAMES: 'dimension_query_names'>, table_format=UnpivotedTableFormatModel(format_type=<TableFormatType.UNPIVOTED: 'unpivoted'>, value_column='value'), output_format='parquet', sort_columns=[], dimension_filters=[], time_zone=None)¶
Controls the output results
- serialize_cached_content()¶
Return a JSON representation of the model that can be used for caching purposes along with a hash that uniquely identifies it.
- pydantic model dsgrid.query.models.ProjectQueryParamsModel¶
Defines how to transform a project into a CompositeDataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
check_unsupported_fields
»all fields
- field dataset: DatasetModel [Required]¶
Definition of the dataset to create.
- Validated by:
- field excluded_dataset_ids: list[str] = []¶
Datasets to exclude from query
- Validated by:
- field include_dsgrid_dataset_components: bool = False¶
- Validated by:
- field project_id: str [Required]¶
Project ID for query
- Validated by:
- field spark_conf_per_dataset: list[SparkConfByDataset] = []¶
Apply these Spark configuration settings while a dataset is being processed.
- Validated by:
- field version: str | None = None¶
Version of project or dataset on which the query is based. Should not be set by the user
- Validated by:
- validator check_unsupported_fields » all fields¶
- get_spark_conf(dataset_id) dict[str, Any] ¶
Return the Spark settings to apply while processing dataset_id.
- pydantic model dsgrid.query.models.QueryResultParamsModel¶
Controls post-processing and storage of CompositeDatasets
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
check_column_type
»all fields
- field aggregations: list[AggregationModel] = []¶
Defines how to aggregate dimensions
- Validated by:
- field column_type: ColumnType = ColumnType.DIMENSION_QUERY_NAMES¶
Whether to make the result table columns dimension types. Default behavior is to use dimension query names. In order to register a result table as a derived dataset, this must be set to dimension_types.
- Validated by:
- field dimension_filters: list[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel] = []¶
Filters to apply to the result. Must contain columns in the result.
- Validated by:
- field output_format: str = 'parquet'¶
Output file format: csv or parquet
- Validated by:
- field replace_ids_with_names: bool = False¶
Replace dimension record IDs with their names in result tables.
- Validated by:
- field reports: list[ReportInputModel] = []¶
Run these pre-defined reports on the result.
- Validated by:
- field sort_columns: list[str] = []¶
Sort the results by these dimension query names.
- Validated by:
- field table_format: PivotedTableFormatModel | UnpivotedTableFormatModel = UnpivotedTableFormatModel(format_type=<TableFormatType.UNPIVOTED: 'unpivoted'>, value_column='value')¶
Defines the format of the value columns of the result table.
- Validated by:
- field time_zone: str | None = None¶
Convert the results to this time zone.
- Validated by:
- validator check_column_type » all fields¶
- validator check_format » output_format¶
- pydantic model dsgrid.query.models.DatasetModel¶
Specifies the datasets to use in a project query.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
- field dataset_id: str [Required]¶
Identifier for the resulting dataset
- field expression: str | None = None¶
Expression to combine datasets. Default is to take a union of all datasets.
- Validated by:
- field params: ProjectQueryDatasetParamsModel = ProjectQueryDatasetParamsModel(dimension_filters=[])¶
Parameters affecting datasets. Used for caching intermediate tables.
- field source_datasets: List[StandaloneDatasetModel | ProjectionDatasetModel] [Required]¶
Datasets from which to read. Each must be of type DatasetBaseModel.
- validator handle_expression » expression¶
- pydantic model dsgrid.query.models.StandaloneDatasetModel¶
A dataset with energy use data.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- field dataset_id: str [Required]¶
Dataset identifier
- field dataset_type: Literal[DatasetType.STANDALONE] = DatasetType.STANDALONE¶
- get_dataset_id() str ¶
Return the primary dataset ID.
- Return type:
str
- pydantic model dsgrid.query.models.ProjectionDatasetModel¶
A dataset with growth rates that can be applied to a standalone dataset.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- field base_year: int | None = None¶
Base year of the dataset to use in growth rate application. Must be a year defined in the principal dataset’s model year dimension. If None, there must be only one model year in that dimension and it will be used.
- field construction_method: DatasetConstructionMethod = DatasetConstructionMethod.EXPONENTIAL_GROWTH¶
Specifier for the code that applies the growth rate to the principal dataset
- field dataset_id: str [Required]¶
Identifier for the resulting dataset
- field dataset_type: Literal[DatasetType.PROJECTION] = DatasetType.PROJECTION¶
- field growth_rate_dataset_id: str [Required]¶
Growth rate dataset identifier to apply to the principal dataset
- field initial_value_dataset_id: str [Required]¶
Principal dataset identifier
- get_dataset_id() str ¶
Return the primary dataset ID.
- Return type:
str
- pydantic model dsgrid.query.models.AggregationModel¶
Aggregate on one or more dimensions.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
- field aggregation_function: Any = None¶
Must be a function name in pyspark.sql.functions
- Validated by:
- field dimensions: DimensionQueryNamesModel [Required]¶
Dimensions on which to aggregate
- Validated by:
- validator check_aggregation_function » aggregation_function¶
- validator check_for_metric » dimensions¶
- iter_dimensions_to_keep()¶
Yield the dimension type and ColumnModel for each dimension to keep.
- list_dropped_dimensions()¶
Return a list of dimension types that will be dropped by the aggregation.
- serialize_aggregation_function(function, _)¶
- pydantic model dsgrid.query.models.ColumnModel¶
Defines one column in a SQL aggregation statement.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
- field alias: str | None = None¶
Name of the resulting column.
- Validated by:
- field dimension_query_name: str [Required]¶
- field function: Any | None = None¶
Function or name of function in pyspark.sql.functions.
- Validated by:
- get_column_name()¶
- serialize_function(function, _)¶
- pydantic model dsgrid.query.models.DimensionQueryNamesModel¶
Defines the list of dimensions to which the value columns should be aggregated. If a value is empty, that dimension will be aggregated and dropped from the table.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- Validators:
fix_columns
»all fields
- field geography: list[str | ColumnModel] [Required]¶
- Validated by:
- field metric: list[str | ColumnModel] [Required]¶
- Validated by:
- field model_year: list[str | ColumnModel] [Required]¶
- Validated by:
- field scenario: list[str | ColumnModel] [Required]¶
- Validated by:
- field sector: list[str | ColumnModel] [Required]¶
- Validated by:
- field subsector: list[str | ColumnModel] [Required]¶
- Validated by:
- field time: list[str | ColumnModel] [Required]¶
- Validated by:
- field weather_year: list[str | ColumnModel] [Required]¶
- Validated by:
- validator fix_columns » all fields¶
- pydantic model dsgrid.query.models.FilteredDatasetModel¶
Filters to apply to a dataset
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- Fields:
- field dataset_id: str [Required]¶
Dataset ID
- field filters: list[DimensionFilterExpressionModel | DimensionFilterExpressionRawModel | DimensionFilterColumnOperatorModel | DimensionFilterBetweenColumnOperatorModel | SubsetDimensionFilterModel | SupplementalDimensionFilterColumnOperatorModel] [Required]¶
- pydantic model dsgrid.query.models.ReportInputModel¶
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- field inputs: Any = None¶
- field report_type: ReportType [Required]¶
- pydantic model dsgrid.query.models.SparkConfByDataset¶
Defines a custom Spark configuration to use while running a query on a dataset.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
__init__ uses __pydantic_self__ instead of the more common self for the first arg to allow self as a field name.
- field conf: dict[str, Any] [Required]¶
- field dataset_id: str [Required]¶
- class dsgrid.query.models.ColumnType(value)¶
Defines what the columns of a dataset table represent.
- class dsgrid.query.models.DatasetType(value)¶
Defines the type of a dataset in a query.
- class dsgrid.query.models.ReportType(value)¶
Pre-defined reports
Submission¶
Examples¶
from dsgrid.dimension.base_models import DimensionType
from dsgrid.registry.registry_manager import RegistryManager
from dsgrid.registry.registry_database import DatabaseConnection
from dsgrid.query.models import (
AggregationModel,
DatasetModel,
DimensionQueryNamesModel,
ProjectQueryParamsModel,
ProjectQueryModel,
QueryResultParamsModel,
StandaloneDatasetModel,
)
from dsgrid.query.query_submitter import ProjectQuerySubmitter
manager = RegistryManager.load(
DatabaseConnection(
hostname="dsgrid-registry.hpc.nrel.gov",
database="standard-scenarios",
),
offline_mode=True
)
project = manager.project_manager.load_project("dsgrid_conus_2022")
query = ProjectQueryModel(
name="Total Electricity Use By State and Sector",
project=ProjectQueryParamsModel(
project_id="dsgrid_conus_2022",
dataset=DatasetModel(
dataset_id="electricity_use",
source_datasets=[
StandaloneDatasetModel(dataset_id="comstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="resstock_conus_2022_projected"),
StandaloneDatasetModel(dataset_id="tempo_conus_2022_mapped"),
],
),
),
result=QueryResultParamsModel(
aggregations=[
AggregationModel(
dimensions=DimensionQueryNamesModel(
geography=["state"],
metric=["electricity_collapsed"],
model_year=[],
scenario=[],
sector=["sector"],
subsector=[],
time=[],
weather_year=[],
),
aggregation_function="sum",
),
],
),
)