Customizing Dataset Checks¶
STRIDE performs validation checks when registering datasets with dsgrid. These checks can be customized using environment variables or programmatically through the API.
Available Checks¶
STRIDE supports the following dataset validation checks:
Check |
Environment Variable |
Default |
Description |
|---|---|---|---|
Time Consistency |
|
|
Validates that time dimensions are consistent across datasets |
Dimension Associations |
|
|
Validates that dimension associations (e.g., geography-sector combinations) are consistent |
Using Environment Variables¶
The simplest way to customize checks is through environment variables:
# Disable time consistency checks (faster project creation)
export STRIDE_CHECK_TIME_CONSISTENCY=false
stride project create my_config.json5
# Enable dimension association checks (more thorough validation)
export STRIDE_CHECK_DIMENSION_ASSOCIATIONS=true
stride project create my_config.json5
Valid values for boolean environment variables:
True:
true,True,TRUE,1False:
false,False,FALSE,0
Programmatic Configuration¶
When using the Python API, you can pass custom requirements directly:
from stride import Project
from dsgrid.dimension.base_models import DatasetDimensionRequirements
# Create custom validation requirements
requirements = DatasetDimensionRequirements(
check_time_consistency=False,
check_dimension_associations=True,
require_all_dimension_types=False,
)
# Create project with custom requirements
project = Project.create(
"my_config.json5",
dataset_requirements=requirements,
)
When to Customize Checks¶
Disabling Time Consistency Checks¶
Consider disabling time consistency checks when:
You’re iterating quickly during development
Your datasets intentionally have different time ranges
You’re working with incomplete data and plan to fill gaps later
STRIDE_CHECK_TIME_CONSISTENCY=false stride project create config.json5
Enabling Dimension Association Checks¶
Consider enabling dimension association checks when:
You need to ensure complete coverage of all geographic regions
You’re preparing data for production use
You want to catch missing sector-geography combinations
STRIDE_CHECK_DIMENSION_ASSOCIATIONS=true stride project create config.json5
Check Behavior¶
Time Consistency (check_time_consistency=true)¶
When enabled, this check verifies:
All datasets have compatible time periods
Timestamps can be properly aligned for joins
No gaps exist in required time series
If validation fails, you’ll see an error indicating which datasets have mismatched time dimensions.
Dimension Associations (check_dimension_associations=true)¶
When enabled, this check verifies:
All expected dimension combinations exist
No orphaned dimension values (e.g., a sector without any geographic data)
Associations are complete across all datasets
This is a more expensive check and is disabled by default for performance.
Performance Considerations¶
Validation checks add overhead to project creation:
Configuration |
Speed |
Thoroughness |
|---|---|---|
Both disabled |
Fastest |
Minimal validation |
Time only (default) |
Fast |
Catches common issues |
Both enabled |
Slower |
Most thorough |
For development workflows, consider disabling checks to speed up iteration, then re-enable them before finalizing your project.