Output Formats¶
Query results can be customized through parameters in QueryResultParamsModel, which controls table structure, file format, data organization, and post-processing. This guide explains how each parameter influences the output.
Overview of QueryResultParamsModel¶
The QueryResultParamsModel class provides fine-grained control over how query results are formatted and stored. Parameters are passed when executing a query and determine:
Table structure — how rows and columns are organized (stacked vs. pivoted)
File format — CSV or Parquet output
Column naming — dimension names vs. dimension types
Data filtering and sorting — which rows appear and in what order
Time zone conversion — localization to specific time zones
Aggregation behavior — how dimensions are rolled up
Additional reports — pre-defined analysis outputs
Table Format: Stacked vs. Pivoted¶
The table_format parameter controls the overall structure of your result table.
Stacked Format (Default)¶
table_format: StackedTableFormatModel()
Structure: Long, narrow table with all dimensions as columns (plus one value column).
geography |
subsector |
time |
value |
|---|---|---|---|
CA |
Industrial |
2020-01-01 |
1200.5 |
CA |
Industrial |
2020-01-02 |
1195.3 |
CA |
Residential |
2020-01-01 |
450.2 |
TX |
Industrial |
2020-01-01 |
2100.8 |
Use when: Working with time series analysis, filtering by multiple dimensions, or performing statistical calculations. This format is most practical for most analyses.
Pivoted Format¶
table_format: PivotedTableFormatModel(
dimension_to_pivot: "subsector" # column to spread across as column headers
)
Structure: Wider table where one dimension’s records become column headers, with values spread across columns.
geography |
time |
Industrial |
Residential |
|---|---|---|---|
CA |
2020-01-01 |
1200.5 |
450.2 |
CA |
2020-01-02 |
1195.3 |
448.7 |
TX |
2020-01-01 |
2100.8 |
620.5 |
Use when: Comparing values across a dimension (e.g., subsector-to-subsector comparison), creating wide comparison matrices, or preparing data for spreadsheet analysis.
Column Naming: Dimension Names vs. Types¶
The column_type parameter controls whether columns use actual dimension names or generic type labels.
Dimension Names (Default)¶
column_type: ColumnType.DIMENSION_NAMES
Columns use actual dimension names from your project: geography, subsector, time, value.
Use when: Reading results directly or performing ad-hoc analysis where dimension names provide context.
Dimension Types¶
column_type: ColumnType.DIMENSION_TYPES
Columns use generic type names: geography_type, subsector_type, time_type, value.
Use when: Registering query results as a derived dataset (required). This naming convention ensures compatibility with the dsgrid data registration system.
File Format¶
The output_format parameter controls the output file format.
output_format: "parquet" # or "csv"
Format |
Advantages |
Disadvantages |
Use When |
|---|---|---|---|
parquet (default) |
Efficient compression, preserves data types, scalable to large files |
Requires specialized tools to read |
Results are large or will be processed further |
csv |
Universal, human-readable, opens in spreadsheet applications |
Large file sizes, loses data type information |
Results are small or for sharing with others |
Replace IDs with Names¶
The replace_ids_with_names parameter controls whether dimension record IDs or names appear in results.
Default Behavior (False)¶
replace_ids_with_names: False
Results contain dimension record IDs:
geography_id |
subsector_id |
value |
|---|---|---|
06001 |
110 |
1200.5 |
06002 |
110 |
950.3 |
With Names Replacement (True)¶
replace_ids_with_names: True
Results contain human-readable names:
geography_name |
subsector_name |
value |
|---|---|---|
Alameda County |
Oil and gas extraction |
1200.5 |
Alpine County |
Oil and gas extraction |
950.3 |
Note: This is purely cosmetic and doesn’t affect data values. Choose based on whether you need human-readable output or IDs for downstream processing.
Aggregation Control¶
The aggregations parameter and aggregate_each_dataset flag control how dimensions are rolled up in your results.
Aggregating Dimensions¶
Use aggregations to roll up (group) dimensions:
aggregations: [
AggregationModel(
dimension: "subsector",
aggregation_function: "sum"
),
AggregationModel(
dimension: "time",
aggregation_function: "mean"
)
]
This removes subsector and time as columns and instead aggregates values across all subsectors (summing) and all times (averaging).
Result: Output has fewer columns (geography only, plus value), with rolled-up values.
Aggregating Individual Datasets vs. Combined Result¶
The aggregate_each_dataset parameter determines when aggregation happens:
aggregate_each_dataset: False # Default: aggregate after combining datasets
aggregate_each_dataset: True # Aggregate each dataset separately first
When combining datasets with different dimensionality (union followed by aggregation), set aggregate_each_dataset: True to avoid duplicate rows.
Sorting Results¶
The sort_columns parameter controls the row order in your output:
sort_columns: ["geography", "time"]
Results are sorted first by geography, then by time. This makes output easier to scan and analyze.
Note: Column order in the output follows dimension order, while sorting affects row order.
Time Zone Conversion¶
The time_zone parameter converts timestamps to a specific time zone:
time_zone: "America/New_York" # Convert to specific IANA timezone
time_zone: "geography" # Convert to each geography's time zone
time_zone: None # Keep original (no conversion)
When time zone conversion is applied:
Timestamps are converted to the specified zone
The time column becomes timezone-naive (no UTC offset embedded)
A separate
time_zonecolumn records the timezone for each row
Example: If your data is in UTC and you set time_zone: "America/Los_Angeles", timestamps shift 7-8 hours earlier (depending on DST).
Filtering Results¶
The dimension_filters parameter applies post-processing filters to the query result:
dimension_filters: [
DimensionFilters(
dimension: "geography",
records: ["06001", "06003", "06005"] # Only these counties
),
DimensionFilters(
dimension: "subsector",
records: ["110", "120"] # Only these subsectors
)
]
Filters are applied after the query executes, removing rows that don’t match. This is different from query-level filters (which reduce computation) but useful for extracting subsets from results.
Reports¶
The reports parameter generates pre-defined analysis reports on the query result:
reports: [
ReportInputModel(report_name: "peak_load"),
ReportInputModel(report_name: "seasonal_pattern")
]
Each report generates additional output files with specific analyses (e.g., peak load analysis, seasonal breakdowns). Available reports depend on your dsgrid installation.
Complete Example¶
Here’s a complete QueryResultParamsModel configuration:
from dsgrid.query.models import (
QueryResultParamsModel,
AggregationModel,
DimensionFilters,
ColumnType,
StackedTableFormatModel,
)
result_params = QueryResultParamsModel(
# Table structure
table_format=StackedTableFormatModel(),
column_type=ColumnType.DIMENSION_NAMES,
# Output format
output_format="parquet",
# Data presentation
replace_ids_with_names=True,
sort_columns=["geography", "time"],
# Filtering and aggregation
aggregations=[
AggregationModel(
dimension="subsector",
aggregation_function="sum"
)
],
aggregate_each_dataset=False,
dimension_filters=[
DimensionFilters(
dimension="geography",
records=["06001", "06003"]
)
],
# Time zone conversion
time_zone="America/Los_Angeles",
# Additional analysis
reports=[
ReportInputModel(report_name="peak_load")
]
)
This configuration produces a Parquet file with:
Stacked rows (long format)
Subsectors aggregated (summed)
Only California counties 06001 and 06003
Timestamps converted to America/Los_Angeles
A peak load report generated
Columns sorted by geography then time
Human-readable dimension names