Installation¶
Python Environment¶
dsgrid requires python=3.11 or later. If you do not already have a python environment with python>=3.11, we recommend using Conda to help manage your python packages and environments.
Steps to Create a dsgrid Conda Environment¶
Download and install Conda if it is not already installed. We recommend Miniconda over Anaconda because it has a smaller installation size.
Create a suitable environment:
conda create -n dsgrid python=3.11
Activate the environment:
conda activate dsgrid
Install Java¶
dsgrid’s key dependency is Apache Spark. Apache Spark requires Java, so check if you have it. Both of these commands must work:
java --version
# openjdk 11.0.12 2021-07-20
echo $JAVA_HOME
# /Users/dthom/brew/Cellar/openjdk@11/11.0.12
# If you don't have java installed:
conda install openjdk
java --version
# openjdk 11.0.13 2021-10-19
# OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef
echo %JAVA_HOME%
# C:\Users\ehale\Anaconda3\envs\dsgrid\Library
# If you don't have java installed:
conda install openjdk
Package Installation¶
To use DuckDB as the backend:¶
pip install dsgrid-toolkit
To use Apache Spark as the backend:¶
pip install "dsgrid-toolkit[spark]"
Registry¶
Standalone Registry¶
To use dsgrid in your own computational environment, you will need to initialize your own registry with this CLI command:
dsgrid create-registry --help
Apache Spark¶
NLR High Performance Computing: How to Start Spark Cluster on Kestrel
Standalone resources: [TODO: Provide link]
Test Your Installation¶
If you’re running dsgrid at NLR and using the shared registry, you can test your installation with this command:
dsgrid -u sqlite:///<your-db-path> registry projects list
Save Your Configuration¶
Running dsgrid config create stores key information for working with dsgrid in a config file at ~/.dsgrid.json5. Currently, dsgrid only supports offline mode, and the other key information to store is the registry URL. The parameters in the config file are the default values used by the command-line interface.
The appropriate configuration for using the shared registry at NLR is:
dsgrid config create sqlite:////projects/dsgrid/standard-scenarios.db
AWS Cloud Access
Access from AWS is under development.
Next Steps¶
Learn about browsing the registry
Explore the tutorials to get started with dsgrid
Understand data file formats for preparing your data