Dataset Mappers

Dataset mappers create the dimension mappings that align a dataset’s dimensions with a project’s base and supplemental dimensions. This is often done by the dataset submitter, but may be handled by a separate team member who understands both the dataset and project dimension structures.

Prerequisites

  • Install dsgrid on your system

  • Access to the project registry (to inspect base dimensions)

  • Familiarity with the dataset’s dimension records

Workflow Overview

  1. Understand dimension mapping — Read Dimension Mapping Concepts to learn how dsgrid translates between dimension systems.

  2. Inspect project dimensions — Use How to Browse the Registry to view the project’s base dimension records and compare them with your dataset’s dimensions.

  3. Choose mapping types — Review Mapping Types (one-to-one, many-to-one, many-to-many) and decide the appropriate type for each dimension.

  4. Create mapping files — Write CSV mapping files and mapping configs. See Mapping Workflows for the step-by-step process.

  5. Validate — dsgrid validates mappings during registration. Review any errors and fix your mapping files.

When You Need Apache Spark

Mapping validation and application can be computationally intensive for datasets with many records or fine-grained dimensions. If you are working with large datasets on NLR HPC:

Key Resources

Core Concepts

How-Tos

Tutorials

Software Reference