How to Run dsgrid on Kestrel¶
This guide explains how to run dsgrid on NLR’s Kestrel HPC system.
Steps¶
1. Start a Screen Session¶
SSH to a login node and start a screen session (or similar tool like tmux):
screen -S dsgrid
This allows you to maintain your session even if you disconnect.
2. Install dsgrid¶
Follow the installation instructions at Installation.
3. Create Runtime Config¶
Create a dsgrid runtime config file pointing to the shared registry:
dsgrid config create sqlite:////projects/dsgrid/standard-scenarios.db
This configures dsgrid to use the NLR shared registry database.
4. Start a Spark Cluster¶
Start a Spark cluster with your desired number of compute nodes by following the instructions at Start Spark Cluster on Kestrel.
5. Run dsgrid Commands¶
Run all CPU-intensive dsgrid commands from the first node in your HPC allocation using spark-submit:
spark-submit --master=spark://$(hostname):7077 $(which dsgrid-cli.py) [command] [options] [args]
Examples:
Register a dataset:
spark-submit --master=spark://$(hostname):7077 $(which dsgrid-cli.py) \
registry datasets register dataset.json5 \
-l "Register my dataset"
Run a query:
spark-submit --master=spark://$(hostname):7077 $(which dsgrid-cli.py) \
query project run query.json5
6. Resume After Disconnect¶
Because you started a screen session at the beginning, if you disconnect from your SSH session for any reason, you can pick your work back up:
SSH to the same login node you used initially
Resume your screen session:
screen -r dsgrid
Tips¶
Use screen or tmux: Long-running jobs benefit from persistent sessions
Monitor jobs: Use
squeue -u $USERto check job statusCheck logs: Review Spark driver and executor logs for debugging
Adjust resources: Scale your Spark cluster based on data size
Common Commands¶
List available registries:
dsgrid registry list
Check project details:
dsgrid registry projects show <project-id>
Validate a config file:
dsgrid config validate dataset.json5
Next Steps¶
Learn how to start a Spark cluster on Kestrel
Follow the query project tutorial
Understand Apache Spark configuration