Download Datasets¶
Prerequisites¶
STRIDE installed and available in your environment
For private repositories: GitHub CLI (
gh) installed and authenticated
List available datasets¶
$ stride datasets list-remote
Download a known dataset¶
$ stride datasets download global
This downloads to ~/.stride/data (or STRIDE_DATA_DIR if set). Both the full dataset
and its test subset are downloaded automatically.
Specify a version¶
$ stride datasets download global -v v0.2.0
Specify a data directory¶
$ stride datasets download global -d /path/to/data
Or set the environment variable:
$ export STRIDE_DATA_DIR=/path/to/data$ set STRIDE_DATA_DIR=/path/to/data$ $Env:STRIDE_DATA_DIR = "/path/to/data"
Download from a custom repository¶
$ stride datasets download --url https://github.com/owner/repo --subdirectory data
Note
The --subdirectory option is required when using --url.
Private repositories¶
Authenticate with GitHub CLI first:
$ gh auth login
Alternative: Clone directly¶
If gh is not available:
$ git clone https://github.com/dsgrid/stride-data.git $ export STRIDE_DATA_DIR=/path/to/stride-data$ git clone https://github.com/dsgrid/stride-data.git $ set STRIDE_DATA_DIR=/path/to/stride-data$ git clone https://github.com/dsgrid/stride-data.git $ $Env:STRIDE_DATA_DIR = "/path/to/stride-data"
Or copy to the default location:
$ git clone https://github.com/dsgrid/stride-data.git
$ mkdir -p ~/.stride/data
$ cp -r stride-data/global ~/.stride/data/
$ cp -r stride-data/global-test ~/.stride/data/
Background¶
Known datasets are hosted in the public stride-data
repository. The list-remote command displays each dataset’s name, repository, subdirectory,
description, and available versions. Datasets may have an associated test subset (shown as
test_subdirectory) which is downloaded automatically alongside the main dataset.
For private repositories, STRIDE uses GitHub CLI authentication. Check your authentication status with:
$ gh auth status