Output Formats¶

Query results can be customized through parameters in QueryResultParamsModel, which controls table structure, file format, data organization, and post-processing. This guide explains how each parameter influences the output.

Overview of QueryResultParamsModel¶

The QueryResultParamsModel class provides fine-grained control over how query results are formatted and stored. Parameters are passed when executing a query and determine:

Table structure — how rows and columns are organized (stacked vs. pivoted)
File format — CSV or Parquet output
Column naming — dimension names vs. dimension types
Data filtering and sorting — which rows appear and in what order
Time zone conversion — localization to specific time zones
Aggregation behavior — how dimensions are rolled up
Additional reports — pre-defined analysis outputs

Table Format: Stacked vs. Pivoted¶

The table_format parameter controls the overall structure of your result table.

Stacked Format (Default)¶

table_format: StackedTableFormatModel()

Structure: Long, narrow table with all dimensions as columns (plus one value column).

geography	subsector	time	value
CA	Industrial	2020-01-01	1200.5
CA	Industrial	2020-01-02	1195.3
CA	Residential	2020-01-01	450.2
TX	Industrial	2020-01-01	2100.8

Use when: Working with time series analysis, filtering by multiple dimensions, or performing statistical calculations. This format is most practical for most analyses.

Pivoted Format¶

table_format: PivotedTableFormatModel(
    dimension_to_pivot: "subsector"  # column to spread across as column headers
)

Structure: Wider table where one dimension’s records become column headers, with values spread across columns.

geography	time	Industrial	Residential
CA	2020-01-01	1200.5	450.2
CA	2020-01-02	1195.3	448.7
TX	2020-01-01	2100.8	620.5

Use when: Comparing values across a dimension (e.g., subsector-to-subsector comparison), creating wide comparison matrices, or preparing data for spreadsheet analysis.

Column Naming: Dimension Names vs. Types¶

The column_type parameter controls whether columns use actual dimension names or generic type labels.

Dimension Names (Default)¶

column_type: ColumnType.DIMENSION_NAMES

Columns use actual dimension names from your project: geography, subsector, time, value.

Use when: Reading results directly or performing ad-hoc analysis where dimension names provide context.

Dimension Types¶

column_type: ColumnType.DIMENSION_TYPES

Columns use generic type names: geography_type, subsector_type, time_type, value.

Use when: Registering query results as a derived dataset (required). This naming convention ensures compatibility with the dsgrid data registration system.

File Format¶

The output_format parameter controls the output file format.

output_format: "parquet"  # or "csv"

Format	Advantages	Disadvantages	Use When
parquet (default)	Efficient compression, preserves data types, scalable to large files	Requires specialized tools to read	Results are large or will be processed further
csv	Universal, human-readable, opens in spreadsheet applications	Large file sizes, loses data type information	Results are small or for sharing with others

Replace IDs with Names¶

The replace_ids_with_names parameter controls whether dimension record IDs or names appear in results.

Default Behavior (False)¶

replace_ids_with_names: False

Results contain dimension record IDs:

geography_id	subsector_id	value
06001	110	1200.5
06002	110	950.3

With Names Replacement (True)¶

replace_ids_with_names: True

Results contain human-readable names:

geography_name	subsector_name	value
Alameda County	Oil and gas extraction	1200.5
Alpine County	Oil and gas extraction	950.3

Note: This is purely cosmetic and doesn’t affect data values. Choose based on whether you need human-readable output or IDs for downstream processing.

Aggregation Control¶

The aggregations parameter and aggregate_each_dataset flag control how dimensions are rolled up in your results.

Aggregating Dimensions¶

Use aggregations to roll up (group) dimensions:

aggregations: [
    AggregationModel(
        dimension: "subsector",
        aggregation_function: "sum"
    ),
    AggregationModel(
        dimension: "time",
        aggregation_function: "mean"
    )
]

This removes subsector and time as columns and instead aggregates values across all subsectors (summing) and all times (averaging).

Result: Output has fewer columns (geography only, plus value), with rolled-up values.

Aggregating Individual Datasets vs. Combined Result¶

The aggregate_each_dataset parameter determines when aggregation happens:

aggregate_each_dataset: False  # Default: aggregate after combining datasets
aggregate_each_dataset: True   # Aggregate each dataset separately first

When combining datasets with different dimensionality (union followed by aggregation), set aggregate_each_dataset: True to avoid duplicate rows.

Sorting Results¶

The sort_columns parameter controls the row order in your output:

sort_columns: ["geography", "time"]

Results are sorted first by geography, then by time. This makes output easier to scan and analyze.

Note: Column order in the output follows dimension order, while sorting affects row order.

Time Zone Conversion¶

The time_zone parameter converts timestamps to a specific time zone:

time_zone: "America/New_York"      # Convert to specific IANA timezone
time_zone: "geography"              # Convert to each geography's time zone
time_zone: None                     # Keep original (no conversion)

When time zone conversion is applied:

Timestamps are converted to the specified zone
The time column becomes timezone-naive (no UTC offset embedded)
A separate time_zone column records the timezone for each row

Example: If your data is in UTC and you set time_zone: "America/Los_Angeles", timestamps shift 7-8 hours earlier (depending on DST).

Filtering Results¶

The dimension_filters parameter applies post-processing filters to the query result:

dimension_filters: [
    DimensionFilters(
        dimension: "geography",
        records: ["06001", "06003", "06005"]  # Only these counties
    ),
    DimensionFilters(
        dimension: "subsector",
        records: ["110", "120"]  # Only these subsectors
    )
]

Filters are applied after the query executes, removing rows that don’t match. This is different from query-level filters (which reduce computation) but useful for extracting subsets from results.

Reports¶

The reports parameter generates pre-defined analysis reports on the query result:

reports: [
    ReportInputModel(report_name: "peak_load"),
    ReportInputModel(report_name: "seasonal_pattern")
]

Each report generates additional output files with specific analyses (e.g., peak load analysis, seasonal breakdowns). Available reports depend on your dsgrid installation.

Complete Example¶

Here’s a complete QueryResultParamsModel configuration:

from dsgrid.query.models import (
    QueryResultParamsModel,
    AggregationModel,
    DimensionFilters,
    ColumnType,
    StackedTableFormatModel,
)

result_params = QueryResultParamsModel(
    # Table structure
    table_format=StackedTableFormatModel(),
    column_type=ColumnType.DIMENSION_NAMES,

    # Output format
    output_format="parquet",

    # Data presentation
    replace_ids_with_names=True,
    sort_columns=["geography", "time"],

    # Filtering and aggregation
    aggregations=[
        AggregationModel(
            dimension="subsector",
            aggregation_function="sum"
        )
    ],
    aggregate_each_dataset=False,
    dimension_filters=[
        DimensionFilters(
            dimension="geography",
            records=["06001", "06003"]
        )
    ],

    # Time zone conversion
    time_zone="America/Los_Angeles",

    # Additional analysis
    reports=[
        ReportInputModel(report_name="peak_load")
    ]
)

This configuration produces a Parquet file with:

Stacked rows (long format)
Subsectors aggregated (summed)
Only California counties 06001 and 06003
Timestamps converted to America/Los_Angeles
A peak load report generated
Columns sorted by geography then time
Human-readable dimension names