Project Coordinators¶
Project coordinators define the structure of a dsgrid project: its base dimensions, supplemental dimensions, dataset requirements, and queries. They are responsible for assembling datasets from multiple contributors into a coherent, queryable whole.
Prerequisites¶
Install dsgrid on your system, including Spark extras:
pip install "dsgrid-toolkit[spark]"Access to NLR HPC (most project coordination tasks involve large datasets)
See How to Start a Spark Cluster on Kestrel for cluster setup
Workflow Overview¶
Design the project — Define the dimensional structure that all datasets will map to. Read Project Concepts for an overview, design considerations, and links to key Dimension and Dataset concepts.
Create base dimensions — Define the finest-grained dimensions the project will support. Follow How to Create Base Dimensions.
Create supplemental dimensions — Define alternative dimensions, typically aggregations, that can be used for querying (e.g., counties → states). Follow How to Create Supplemental Dimensions.
Register the project — Create the project config and register it. See the Create a Project tutorial.
Coordinate dataset submissions — Work with dataset submitters to register and validate their contributions.
Define and run queries — Assemble the data using queries. See Query Concepts and How to Filter Query Results.
Create derived datasets — Build derived datasets from query results for publication. See Derived Dataset Concepts.