Contributing¶

Development setup¶

The project uses uv to manage the Python environment and pin dependencies via uv.lock. Install uv first (see the uv install guide), then:

git clone https://github.com/dsgrid/datasight.git
cd datasight
uv sync --extra dev
. .venv/bin/activate

uv sync creates .venv/ automatically and installs the project plus the dev extras pinned in uv.lock.

Project structure¶

src/datasight/
├── cli.py              # Click CLI commands (run, ask, init, demo, generate, verify, profile, quality, doctor, export, log)
├── agent.py            # Shared agent loop and tool execution
├── config.py           # Configuration helpers
├── data_profile.py     # Deterministic dataset overviews and CLI/web recipes
├── schema.py           # Database introspection
├── llm.py              # LLM client abstraction
├── chart.py            # Plotly chart generator
├── runner.py           # SQL execution backends (DuckDB, SQLite, Postgres, Flight SQL)
├── export.py           # Session-to-HTML export
├── verify.py           # Query verification engine
├── demo.py             # Demo dataset generator
└── web/
    ├── app.py          # FastAPI server + SSE streaming
    ├── static/         # Vite build output (assets/) + icons
    └── templates/
        └── index.html  # Generated by Vite build
frontend/               # Svelte 5 + TypeScript + Tailwind source
├── src/
│   ├── App.svelte      # Root component
│   ├── main.ts         # Entry point
│   ├── app.css         # Tailwind + design tokens
│   └── lib/
│       ├── stores/     # Svelte 5 rune-based stores
│       ├── api/        # Typed API client functions
│       ├── components/ # ~40 Svelte components
│       └── utils/      # Search, format, markdown utilities
├── tests/              # Vitest unit tests
└── e2e/                # Playwright E2E tests

Running locally¶

# Build the generated web assets once after a clean checkout
bash scripts/build-frontend.sh

# Start with a demo project
datasight demo ./dev-project
cd dev-project
# Edit .env with your API key (Anthropic, GitHub token, or Ollama)
datasight run -v

The -v flag enables debug logging, which shows the full LLM request/response cycle including tool calls.

The FastAPI app serves generated files from src/datasight/web/static/ and src/datasight/web/templates/index.html. Those files are ignored by git. Run bash scripts/build-frontend.sh after a clean checkout and whenever you want datasight run to serve a freshly built production UI.

Pre-commit hooks¶

The project uses prek — a drop-in replacement for pre-commit — to run checks automatically on every commit. It reads the same .pre-commit-config.yaml. Install the hooks after cloning:

prek install

Hooks run ruff (lint + format), ruff-format, ty (type checking), and a docs CLI reference drift check. If a hook fails, it will either auto-fix the file (ruff format) or show you what to fix. Stage the fixes and commit again.

To run all hooks manually against every file:

prek run --all-files

Code style¶

The project uses ruff for linting and formatting, and ty for type checking.

# Run manually
ruff check src/
ruff format src/
ty check

Frontend¶

The frontend is built with Svelte 5 + TypeScript + Tailwind CSS and uses Vite as the build tool. Source lives in frontend/.

cd frontend
npm install
npm run dev           # Vite dev server on :5173 (proxies /api to :8084)
npm run check         # Svelte + TypeScript checks
npm test              # Vitest unit tests
npm run build         # Production build

For frontend development, run datasight run in another terminal for the API server, then use npm run dev from frontend/. To test the exact production UI served by FastAPI, build and copy the frontend output into FastAPI's serving directories:

bash scripts/build-frontend.sh

Release builds run this script before packaging. Hatch includes the generated assets in the sdist and wheel as build artifacts, so the repository stays free of generated frontend bundles while published packages still contain the web UI.

Testing¶

# Run the full Python test suite
pytest

# Run the CI-safe Python suite, excluding local Ollama integration tests
pytest -m "not integration"

# Run only tests that require a local Ollama model
pytest -m integration

# Frontend unit tests (Vitest)
cd frontend && npm test

# Frontend E2E tests (Playwright, requires datasight run)
cd frontend && npm run test:e2e

# Run the verification suite against a demo project
datasight demo ./test-project
cd test-project
datasight verify -v

Tests marked integration require a running local Ollama server with the qwen2.5:7b model available:

ollama pull qwen2.5:7b
ollama serve
pytest -m integration

Keep live-provider tests behind that marker so CI can run deterministically without a local model.

Documentation¶

Docs use Zensical.

uv sync --extra dev
. .venv/bin/activate
zensical serve
zensical build

When you change Click commands or help text, regenerate the static CLI docs:

python scripts/generate_cli_reference.py

Regenerating UI screenshots¶

Screenshots embedded in the docs live in docs/assets/screenshots/ and are captured by a dedicated Playwright spec at frontend/e2e/screenshots.spec.ts. They're excluded from the regular npm run test:e2e run (via --grep-invert screenshots) because they need a project-loaded server and specific UI state.

Regenerate when the UI changes in a way that makes the committed images stale.

Start a server with a demo project in one terminal:

datasight demo eia-generation ~/datasight-eia-demo
datasight run --project-dir ~/datasight-eia-demo

In the live browser, run a chart-producing query (e.g. "Show monthly generation by fuel type as a line chart") and pin at least one card to the dashboard. The capture spec loads the first saved conversation for the chart screenshot, and reads dashboard state for the dashboard screenshot.

Run the capture spec in a second terminal:

cd frontend
npm run capture-screenshots

To regenerate a single view, filter by test name:

npm run capture-screenshots -- --grep chart-result

All tests capture in dark mode at a fixed 1280×800 viewport for visual consistency. The landing test intercepts GET /api/project so it renders the no-project landing page even against a server that has a project loaded.