How to Filter a Query¶

dsgrid offers several ways to filter the result of a query. It is important to understand some dsgrid behaviors to get an optimal result. Please refer to query concepts for details.

The examples below show how to define the filters in JSON5 or Python as well as the equivalent implementation if you were to filter the dataframe with Spark in Python (pyspark).

All examples except DimensionFilterBetweenColumnOperatorModel assume that the dataframe being filtered is the dimension record table. DimensionFilterBetweenColumnOperatorModel assumes that the table is the load data dataframe with time-series information.

Note

Whenever multiple filters are provided in an array, dsgrid performs an and across all filters.

Common Operators¶

Operator	Description	Example Value
`==`	Equals	`"06037"`
`!=`	Not equals	`"06037"`
`>`	Greater than	`"2020"`
`>=`	Greater than or equal	`"2020"`
`<`	Less than	`"2050"`
`<=`	Less than or equal	`"2050"`
`isin`	In list	`["06037", "06073"]`
`like`	Pattern match	`"%County"`
`rlike`	Regex match	`"^06.*"`
`between`	Between two values	`["2020", "2050"]`

Negating Filters¶

Set negate: true to invert any filter. For example, to exclude specific counties:

dimension_filters: [
  {
    dimension_type: "geography",
    dimension_name: "county",
    column: "id",
    operator: "isin",
    value: ["02013", "02016"],  // Alaska counties
    filter_type: "column_operator",
    negate: true,  // Exclude these counties
  },
]

Combining Multiple Filters¶

When multiple filters are provided in an array, dsgrid applies them with an AND operation. To filter for California counties in years 2030-2050:

dimension_filters: [
  {
    dimension_type: "geography",
    dimension_name: "county",
    column: "id",
    operator: "like",
    value: "06%",  // CA counties start with 06
    filter_type: "column_operator",
    negate: false,
  },
  {
    dimension_type: "model_year",
    dimension_name: "model_year",
    column: "id",
    operator: "between",
    value: ["2030", "2050"],
    filter_type: "column_operator",
    negate: false,
  },
]

Best Practices¶

Filter early: Apply filters at the dataset level when possible to reduce data processing
Use appropriate filter types: Choose the most specific filter type for your use case
Leverage supplemental dimensions: Use supplemental dimension filters for complex aggregations
Test incrementally: Start with simple filters and add complexity
Check dimension names: Ensure dimension names match those defined in the project

Next Steps¶

Learn about query concepts for understanding query processing
Follow the query project tutorial
Explore the CLI reference for command-line query options

How to Filter a Query¶

Filter Types¶

1. Expression Filter¶

2. Raw Expression Filter¶

3. Column Operator Filter¶

4. Supplemental Dimension Filter¶

5. Time-Based Filter¶

Common Operators¶

Negating Filters¶

Combining Multiple Filters¶

Best Practices¶

Next Steps¶