Download Datasets¶
STRIDE provides a CLI command to download datasets from remote repositories. This guide shows how to download pre-configured datasets as well as custom datasets from GitHub.
Prerequisites¶
STRIDE installed and available in your environment
For private repositories: GitHub CLI (
gh) installed and authenticated
List Available Datasets¶
To see the known datasets available for download along with their available versions:
$ stride datasets list-remote
This will display each dataset’s name, repository, subdirectory, description, and available
versions. Datasets may also have an associated test dataset (shown as test_subdirectory)
which is automatically downloaded alongside the main dataset.
Download a Known Dataset¶
To download a known dataset to the default location (~/.stride/data or STRIDE_DATA_DIR):
$ stride datasets download global
This single command downloads both the full global dataset and the smaller global-test
subset from the same release archive. The test dataset enables faster iteration during development.
Specify a Data Directory¶
Use the -d or --data-dir option to download to a specific location:
$ stride datasets download global -d ./my_data
Alternatively, set the STRIDE_DATA_DIR environment variable for a persistent default:
$ export STRIDE_DATA_DIR=/path/to/data
$ stride datasets download global
Specify a Version¶
By default, the latest release is downloaded. To download a specific version:
$ stride datasets download global -v v0.2.0
Download from a Custom Repository¶
To download a dataset from any GitHub repository, use the --url and --subdirectory options:
$ stride datasets download --url https://github.com/owner/repo --subdirectory data
Note
The --subdirectory option is required when using --url.
Private Repository Authentication¶
For private repositories, STRIDE automatically uses your GitHub CLI authentication. Ensure you are logged in:
$ gh auth status
If not authenticated, run:
$ gh auth login