(data-download)= # Data Download STRIDE retrieves input datasets from remote repositories, typically GitHub releases. This page explains how the download system works and how to use it. ## Default Data Repository STRIDE uses the [dsgrid/stride-data](https://github.com/dsgrid/stride-data) repository as its primary data source. This repository contains: - **global** - Full dataset for all supported countries - **global-test** - Smaller test dataset for development and testing ## How Downloads Work The download system follows this flow: 1. **Query GitHub API** - Fetch available release versions from the repository 2. **Select Version** - Use the latest release by default, or specify a version 3. **Download Archive** - Retrieve the release tarball 4. **Extract and Install** - Extract to `~/.stride/data//` ### Authentication STRIDE attempts to use the GitHub CLI (`gh`) for authentication when available. This is useful for: - Private repositories - Avoiding rate limits on public repositories If the GitHub CLI is not installed or authenticated, STRIDE falls back to unauthenticated API requests. ## CLI Commands ### List Available Datasets ```bash stride datasets list-remote ``` This shows all known datasets with their available versions. ### Download a Dataset ```bash # Download the global dataset (latest version) # This automatically includes the global-test subset stride datasets download global # Download a specific version stride datasets download global --version v0.2.0 ``` ### Download from a Custom Repository For advanced use cases, you can download from any GitHub repository: ```bash stride datasets download --url dsgrid/my-custom-data --subdirectory my-subset ``` ## Storage Location Downloaded datasets are stored in the data directory, which is determined by (in order of precedence): 1. The `--data-dir` CLI option 2. The `STRIDE_DATA_DIR` environment variable 3. The default location: `~/.stride/data/` ``` ~/.stride/data/ ├── global/ │ ├── dimension_mappings.json5 │ ├── energy_intensity/ │ ├── gdp/ │ ├── load_shapes/ │ └── ... └── global-test/ └── ... ``` ### Custom Data Directory To use a custom data directory persistently, set the `STRIDE_DATA_DIR` environment variable: ```bash export STRIDE_DATA_DIR=/path/to/my/data stride datasets download global stride projects create my_config.json5 ``` Or specify it per-command with the `--data-dir` option: ```bash stride datasets download global --data-dir /path/to/my/data stride projects create my_config.json5 --data-dir /path/to/my/data ``` ## Using Test Data When creating a project, you can use the smaller test dataset for faster iteration: ```{eval-rst} .. tabs:: .. tab:: CLI .. code-block:: console # Use test data for development $ stride projects create my_config.json5 --dataset global-test # Use full data for production $ stride projects create my_config.json5 .. tab:: Python .. code-block:: python from stride import Project # Use test data for development project = Project.create("my_config.json5", dataset="global-test") # Use full data for production project = Project.create("my_config.json5") ``` The test dataset contains the same structure but with reduced data volume, making it suitable for: - Development and debugging - CI/CD pipelines - Learning the STRIDE workflow ## Error Handling Common download issues: - **Dataset not found** - Ensure the dataset name matches a known dataset or provide a valid `--url` - **Authentication required** - Install and authenticate the GitHub CLI with `gh auth login` - **Destination exists** - Remove the existing dataset directory before downloading again