Dataset Config¶
DatasetConfigModel¶
dsgrid.config.dataset_config.DatasetConfigModel
Represents dataset configurations.
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Registry database ID |
|
|
|
Version, generated by dsgrid |
|
|
(required) |
Unique dataset identifier. |
|
UserDataLayout | None |
|
Defines the data layout (table format, value format, and file paths) for dataset registration. |
|
RegistryDataLayout | None |
|
Defines the dataset’s data layout once stored in the registry. |
|
|
Input dataset type. |
|
|
|
Additional metadata to include related to the dataset_qualifier |
|
|
|
|
A detailed description of the dataset |
|
|
|
Sectoral description (e.g., residential, commercial, industrial, transportation, electricity) |
|
|
|
Original data source name, e.g. ‘ComStock’, ‘EIA 861’. |
|
|
|
Date or year the original source data were published, e.g., ‘2021’ for ‘EIA AEO 2021’. |
|
|
|
Source data version, if applicable. For example, could specify preliminary versus final data. |
|
list[ |
|
List of authors for the original data source. |
|
|
|
Original data source doi or other url |
|
|
|
First and last name of the person who formatted this dataset for dsgrid |
|
|
|
Organization name of the origin_creator, e.g., ‘NREL’ |
|
list[ |
|
List of contributors to the compilation of this dataset for dsgrid, e.g., [‘Harry Potter’, ‘Ronald Weasley’] |
|
|
|
Name of the project for/from which this dataset was compiled, e.g., ‘IEF’, ‘Building Standard Scenarios’. |
|
dict[ |
|
Additional user defined metadata fields |
|
list[ |
|
List of data tags |
|
(required) |
Data security classification (e.g., low, moderate). |
|
|
|
|
If the dataset uses its dimension mapping for the metric dimension to also perform unit conversion, then this value should be false. |
|
|
|
If true, time zones will be applied from the project’s geography dimension. If false, the dataset’s geography dimension records must provide a time zone column. |
|
list[DimensionModel | DateTimeDimensionModel | AnnualTimeDimensionModel | RepresentativePeriodTimeDimensionModel | DatetimeExternalTimeZoneDimensionModel | IndexTimeDimensionModel | NoOpTimeDimensionModel] |
|
List of dimensions that make up the dimensions of dataset. They will be automatically registered during dataset registration and then converted to dimension_references. |
|
list[DimensionReferenceModel] |
|
List of registered dimension references that make up the dimensions of dataset. |
|
list[DimensionType] |
|
List of trivial dimensions (i.e., 1-element dimensions) that do not exist in the load_data_lookup. List the dimensions by dimension type. Trivial dimensions are 1-element dimensions that are not present in the parquet data columns. Instead they are added by dsgrid as an alias column. |
Validators¶
Name |
Applies To |
Description |
|---|---|---|
|
|
Check dataset ID validity |
|
|
No description |
|
|
Validate dimension files are unique across all dimensions |
|
|
Validate dimension names are unique across all dimensions. |
|
|
Ensure data_layout and registry_data_layout are mutually exclusive. |
|
|
Validate whether required time zone information is present. |
Column¶
dsgrid.config.file_schema.Column
Base data model for all dsgrid data models
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Name of the column |
|
DimensionType | None |
|
Dimension represented by the data in the column. Optional if this is a time column or pivoted column. Required if the column represents a stacked dimension but an alternate name is being used, such as ‘county’ instead of ‘geography’. dsgrid will rename any column that is set at runtime, writing out the result to the registry’s data directory. The original dataset is not modified. |
|
|
|
Type of the data in the column. If None, infer the type. |
Validators¶
Name |
Applies To |
Description |
|---|---|---|
|
|
No description |
FileSchema¶
dsgrid.config.file_schema.FileSchema
Defines the format of a data file (CSV, JSON, Parquet).
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the file. Must be assigned during registration. |
|
list[Column] |
|
Custom schema for the columns in the file. |
|
list[ |
|
List of column names to ignore (drop) when reading the file. |
Validators¶
Name |
Applies To |
Description |
|---|---|---|
|
|
No description |
GrowthRateModel¶
dsgrid.config.dataset_config.GrowthRateModel
Base data model for all dsgrid data models
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
|
|
(required) |
Type of growth rates, e.g., exponential_annual |
QuantityModel¶
dsgrid.config.dataset_config.QuantityModel
Base data model for all dsgrid data models
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
RegistryDataLayout¶
dsgrid.config.dataset_config.RegistryDataLayout
Data layout stored in the dsgrid registry (without file paths).
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
(required) |
Table structure: one_table or two_table. |
|
|
(required) |
Value column format: stacked or pivoted. |
|
|
DimensionType | None |
|
The dimension type whose records are columns when pivoted. |
Validators¶
Name |
Applies To |
Description |
|---|---|---|
|
|
Validate data layout consistency. |
UserDataLayout¶
dsgrid.config.dataset_config.UserDataLayout
User-defined data layout for dataset registration.
Fields¶
Name |
Type |
Default |
Description |
|---|---|---|---|
|
(required) |
Defines the data file |
|
|
FileSchema | None |
|
Defines the lookup data file. Required if the table format is ‘two_table’. |
|
list[ |
|
List of paths to missing associations files (e.g., missing_associations.parquet) or directories of files containing missing combinations by dimension type (e.g., geography__subsector.csv, subsector__metric.csv). |
|
(required) |
Table structure: one_table (all data in single table) or two_table (time series data separate from lookup metadata). |
|
|
(required) |
Value column format: stacked (single value column) or pivoted (one dimension’s records as columns). |
|
|
DimensionType | None |
|
The dimension type whose records are columns (pivoted) that contain data values. Required when value_format is ‘pivoted’. |
Validators¶
Name |
Applies To |
Description |
|---|---|---|
|
|
Validate data layout consistency. |