Dataset Config

DatasetConfigModel

dsgrid.config.dataset_config.DatasetConfigModel

Represents dataset configurations.

Fields

Name

Type

Default

Description

id

int | None

None

Registry database ID

version

str | None

None

Version, generated by dsgrid

dataset_id

str

(required)

Unique dataset identifier.

data_layout

UserDataLayout | None

None

Defines the data layout (table format, value format, and file paths) for dataset registration.

registry_data_layout

RegistryDataLayout | None

None

Defines the dataset’s data layout once stored in the registry.

dataset_type

InputDatasetType

InputDatasetType.UNSPECIFIED

Input dataset type.

dataset_qualifier_metadata

QuantityModel | GrowthRateModel

dataset_qualifier_type=<DatasetQualifierType.QUANTITY: 'quantity'>

Additional metadata to include related to the dataset_qualifier

description

str | None

None

A detailed description of the dataset

sector_description

str | None

None

Sectoral description (e.g., residential, commercial, industrial, transportation, electricity)

data_source

str | None

None

Original data source name, e.g. ‘ComStock’, ‘EIA 861’.

data_source_date

str | None

None

Date or year the original source data were published, e.g., ‘2021’ for ‘EIA AEO 2021’.

data_source_version

str | None

None

Source data version, if applicable. For example, could specify preliminary versus final data.

data_source_authors

list[str] | None

None

List of authors for the original data source.

data_source_doi_url

str | None

None

Original data source doi or other url

origin_creator

str | None

None

First and last name of the person who formatted this dataset for dsgrid

origin_organization

str | None

None

Organization name of the origin_creator, e.g., ‘NREL’

origin_contributors

list[str] | None

None

List of contributors to the compilation of this dataset for dsgrid, e.g., [‘Harry Potter’, ‘Ronald Weasley’]

origin_project

str | None

None

Name of the project for/from which this dataset was compiled, e.g., ‘IEF’, ‘Building Standard Scenarios’.

user_defined_metadata

dict[str, Any]

{}

Additional user defined metadata fields

tags

list[str] | None

None

List of data tags

data_classification

DataClassificationType

(required)

Data security classification (e.g., low, moderate).

enable_unit_conversion

bool

True

If the dataset uses its dimension mapping for the metric dimension to also perform unit conversion, then this value should be false.

use_project_geography_time_zone

bool

False

If true, time zones will be applied from the project’s geography dimension. If false, the dataset’s geography dimension records must provide a time zone column.

dimensions

list[DimensionModel | DateTimeDimensionModel | AnnualTimeDimensionModel | RepresentativePeriodTimeDimensionModel | DatetimeExternalTimeZoneDimensionModel | IndexTimeDimensionModel | NoOpTimeDimensionModel]

[]

List of dimensions that make up the dimensions of dataset. They will be automatically registered during dataset registration and then converted to dimension_references.

dimension_references

list[DimensionReferenceModel]

[]

List of registered dimension references that make up the dimensions of dataset.

trivial_dimensions

list[DimensionType]

[]

List of trivial dimensions (i.e., 1-element dimensions) that do not exist in the load_data_lookup. List the dimensions by dimension type. Trivial dimensions are 1-element dimensions that are not present in the parquet data columns. Instead they are added by dsgrid as an alias column.

Validators

Name

Applies To

Description

check_dataset_id

check_dataset_id

Check dataset ID validity

check_time_not_trivial

check_time_not_trivial

No description

check_files

check_files

Validate dimension files are unique across all dimensions

check_names

check_names

Validate dimension names are unique across all dimensions.

check_layout_fields

*(model)*

Ensure data_layout and registry_data_layout are mutually exclusive.

check_time_zone

*(model)*

Validate whether required time zone information is present.


Column

dsgrid.config.file_schema.Column

Base data model for all dsgrid data models

Fields

Name

Type

Default

Description

name

str

(required)

Name of the column

dimension_type

DimensionType | None

None

Dimension represented by the data in the column. Optional if this is a time column or pivoted column. Required if the column represents a stacked dimension but an alternate name is being used, such as ‘county’ instead of ‘geography’. dsgrid will rename any column that is set at runtime, writing out the result to the registry’s data directory. The original dataset is not modified.

data_type

str | None

None

Type of the data in the column. If None, infer the type.

Validators

Name

Applies To

Description

check_data_type

check_data_type

No description


FileSchema

dsgrid.config.file_schema.FileSchema

Defines the format of a data file (CSV, JSON, Parquet).

Fields

Name

Type

Default

Description

path

str | None

(required)

Path to the file. Must be assigned during registration.

columns

list[Column]

[]

Custom schema for the columns in the file.

ignore_columns

list[str]

[]

List of column names to ignore (drop) when reading the file.

Validators

Name

Applies To

Description

check_consistency

*(model)*

No description


GrowthRateModel

dsgrid.config.dataset_config.GrowthRateModel

Base data model for all dsgrid data models

Fields

Name

Type

Default

Description

dataset_qualifier_type

Literal

"DatasetQualifierType.GROWTH_RATE"

growth_rate_type

GrowthRateType

(required)

Type of growth rates, e.g., exponential_annual


QuantityModel

dsgrid.config.dataset_config.QuantityModel

Base data model for all dsgrid data models

Fields

Name

Type

Default

Description

dataset_qualifier_type

Literal

"DatasetQualifierType.QUANTITY"


RegistryDataLayout

dsgrid.config.dataset_config.RegistryDataLayout

Data layout stored in the dsgrid registry (without file paths).

Fields

Name

Type

Default

Description

table_format

TableFormat

(required)

Table structure: one_table or two_table.

value_format

ValueFormat

(required)

Value column format: stacked or pivoted.

pivoted_dimension_type

DimensionType | None

None

The dimension type whose records are columns when pivoted.

Validators

Name

Applies To

Description

validate_layout

*(model)*

Validate data layout consistency.


UserDataLayout

dsgrid.config.dataset_config.UserDataLayout

User-defined data layout for dataset registration.

Fields

Name

Type

Default

Description

data_file

FileSchema

(required)

Defines the data file

lookup_data_file

FileSchema | None

None

Defines the lookup data file. Required if the table format is ‘two_table’.

missing_associations

list[str]

[]

List of paths to missing associations files (e.g., missing_associations.parquet) or directories of files containing missing combinations by dimension type (e.g., geography__subsector.csv, subsector__metric.csv).

table_format

TableFormat

(required)

Table structure: one_table (all data in single table) or two_table (time series data separate from lookup metadata).

value_format

ValueFormat

(required)

Value column format: stacked (single value column) or pivoted (one dimension’s records as columns).

pivoted_dimension_type

DimensionType | None

None

The dimension type whose records are columns (pivoted) that contain data values. Required when value_format is ‘pivoted’.

Validators

Name

Applies To

Description

validate_layout

*(model)*

Validate data layout consistency.