# Installation ## Python Environment dsgrid requires python=3.11 or later. If you do not already have a python environment with python>=3.11, we recommend using [Conda](https://conda.io/projects/conda/en/latest/index.html) to help manage your python packages and environments. ### Steps to Create a dsgrid Conda Environment 1. [Download and install Conda](https://conda.io/projects/conda/en/latest/user-guide/install) if it is not already installed. We recommend Miniconda over Anaconda because it has a smaller installation size. 2. Create a suitable environment: ```bash conda create -n dsgrid python=3.11 ``` 3. Activate the environment: ```bash conda activate dsgrid ``` ### Install Java dsgrid's key dependency is Apache Spark. Apache Spark requires Java, so check if you have it. Both of these commands must work: ::::{tab-set} :::{tab-item} Mac/Linux ```bash java --version # openjdk 11.0.12 2021-07-20 echo $JAVA_HOME # /Users/dthom/brew/Cellar/openjdk@11/11.0.12 # If you don't have java installed: conda install openjdk ``` ::: :::{tab-item} Windows ```pwsh java --version # openjdk 11.0.13 2021-10-19 # OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef echo %JAVA_HOME% # C:\Users\ehale\Anaconda3\envs\dsgrid\Library # If you don't have java installed: conda install openjdk ``` ::: :::: ## Package Installation ### To use DuckDB as the backend: ```bash pip install dsgrid-toolkit ``` ### To use Apache Spark as the backend: ```bash pip install "dsgrid-toolkit[spark]" ``` ## Registry ### NLR Shared Registry The current dsgrid registries are stored in per-project SQLite database files. All configuration information is stored in the database(s) and all dataset files are stored on the NLR HPC shared filesystem. ### Standalone Registry To use dsgrid in your own computational environment, you will need to initialize your own registry with this CLI command: ```bash dsgrid create-registry --help ``` ## Apache Spark - **NLR High Performance Computing**: [How to Start Spark Cluster on Kestrel](../user_guide/how_tos/spark_cluster_on_kestrel) - **Standalone resources**: [TODO: Provide link] ## Test Your Installation If you're running dsgrid at NLR and using the shared registry, you can test your installation with this command: ```bash dsgrid -u sqlite:/// registry projects list ``` ## Save Your Configuration Running `dsgrid config create` stores key information for working with dsgrid in a config file at `~/.dsgrid.json5`. Currently, dsgrid only supports offline mode, and the other key information to store is the registry URL. The parameters in the config file are the default values used by the command-line interface. The appropriate configuration for using the shared registry at NLR is: ```bash dsgrid config create sqlite:////projects/dsgrid/standard-scenarios.db ``` :::{admonition} AWS Cloud Access :class: note Access from AWS is under development. ::: ## Next Steps - Learn about [browsing the registry](../user_guide/how_tos/browse_registry) - Explore the [tutorials](../user_guide/tutorials/index) to get started with dsgrid - Understand [data file formats](../user_guide/dataset_registration/data_file_formats) for preparing your data