Dataset Mappers¶

Dataset mappers create the dimension mappings that align a dataset’s dimensions with a project’s base and supplemental dimensions. This is often done by the dataset submitter, but may be handled by a separate team member who understands both the dataset and project dimension structures.

Prerequisites¶

Install dsgrid on your system
Access to the project registry (to inspect base dimensions)
Familiarity with the dataset’s dimension records

Workflow Overview¶

Understand dimension mapping — Read Dimension Mapping Concepts to learn how dsgrid translates between dimension systems.
Inspect project dimensions — Use How to Browse the Registry to view the project’s base dimension records and compare them with your dataset’s dimensions.
Choose mapping types — Review Mapping Types (one-to-one, many-to-one, many-to-many) and decide the appropriate type for each dimension.
Create mapping files — Write CSV mapping files and mapping configs. See Mapping Workflows for the step-by-step process.
Validate — dsgrid validates mappings during registration. Review any errors and fix your mapping files.

When You Need Apache Spark¶

Mapping validation and application can be computationally intensive for datasets with many records or fine-grained dimensions. If you are working with large datasets on NLR HPC:

Install the Spark extras: pip install "dsgrid-toolkit[spark]"
See How to Run dsgrid on Kestrel

Key Resources¶

Core Concepts¶

Dataset Mappers¶

Prerequisites¶

Workflow Overview¶

When You Need Apache Spark¶

Key Resources¶

Core Concepts¶

How-Tos¶

Tutorials¶

Software Reference¶