Data Download¶

STRIDE retrieves input datasets from remote repositories, typically GitHub releases. This page explains how the download system works and how to use it.

Default Data Repository¶

STRIDE uses the dsgrid/stride-data repository as its primary data source. This repository contains:

global - Full dataset for all supported countries
global-test - Smaller test dataset for development and testing

How Downloads Work¶

The download system follows this flow:

Query GitHub API - Fetch available release versions from the repository
Select Version - Use the latest release by default, or specify a version
Download Archive - Retrieve the release tarball
Extract and Install - Extract to ~/.stride/data/<dataset-name>/

Authentication¶

STRIDE attempts to use the GitHub CLI (gh) for authentication when available. This is useful for:

Private repositories
Avoiding rate limits on public repositories

If the GitHub CLI is not installed or authenticated, STRIDE falls back to unauthenticated API requests.

CLI Commands¶

List Available Datasets¶

stride datasets list-remote

This shows all known datasets with their available versions.

Download a Dataset¶

# Download the global dataset (latest version)
# This automatically includes the global-test subset
stride datasets download global

# Download a specific version
stride datasets download global --version v0.2.0

Download from a Custom Repository¶

For advanced use cases, you can download from any GitHub repository:

stride datasets download --url dsgrid/my-custom-data --subdirectory my-subset

Storage Location¶

Downloaded datasets are stored in the data directory, which is determined by (in order of precedence):

The --data-dir CLI option
The STRIDE_DATA_DIR environment variable
The default location: ~/.stride/data/

~/.stride/data/
├── global/
│   ├── dimension_mappings.json5
│   ├── energy_intensity/
│   ├── gdp/
│   ├── load_shapes/
│   └── ...
└── global-test/
    └── ...

Custom Data Directory¶

To use a custom data directory persistently, set the STRIDE_DATA_DIR environment variable:

export STRIDE_DATA_DIR=/path/to/my/data
stride datasets download global
stride projects create my_config.json5

set STRIDE_DATA_DIR=/path/to/my/data
stride datasets download global
stride projects create my_config.json5

$env:STRIDE_DATA_DIR = "/path/to/my/data"
stride datasets download global
stride projects create my_config.json5

Or specify it per-command with the --data-dir option:

stride datasets download global --data-dir /path/to/my/data
stride projects create my_config.json5 --data-dir /path/to/my/data

Using Test Data¶

When creating a project, you can use the smaller test dataset for faster iteration:

# Use test data for development
$ stride projects create my_config.json5 --dataset global-test

# Use full data for production
$ stride projects create my_config.json5

from stride import Project

# Use test data for development
project = Project.create("my_config.json5", dataset="global-test")

# Use full data for production
project = Project.create("my_config.json5")

The test dataset contains the same structure but with reduced data volume, making it suitable for:

Development and debugging
CI/CD pipelines
Learning the STRIDE workflow

Error Handling¶

Common download issues:

Dataset not found - Ensure the dataset name matches a known dataset or provide a valid --url
Authentication required - Install and authenticate the GitHub CLI with gh auth login
Destination exists - Remove the existing dataset directory before downloading again