Choosing an LLM¶
datasight works with any of several LLM backends. The right choice depends on your data sensitivity, budget, and whether you want to run a model locally. This page helps you pick one without reading every provider's pricing page.
Quick decision guide¶
| Your situation | Start with |
|---|---|
| Trying datasight for the first time, non-sensitive data | Anthropic Claude Haiku or OpenAI GPT-4o-mini |
| Want zero cost, don't mind rate limits | GitHub Models (free tier, recommended over Ollama for most users) |
| Already have an OpenAI key | OpenAI (gpt-4o-mini or gpt-4.1-mini) |
| Data is sensitive and must not leave your network | Local Ollama (laptop or HPC GPU node) |
| Data is sensitive but you have a secure hosted endpoint | Anthropic on Bedrock or Azure OpenAI (custom base_url) |
| Writing SQL for a well-documented schema | Haiku / GPT-4o-mini is usually enough |
| Complex multi-step analytical questions, poor results from the cheap tier | Step up to Sonnet or GPT-4o |
When in doubt, start with Haiku. datasight's main job — turning a question into SQL against a documented schema — is not a frontier-model task, and Haiku handles it well for most projects.
Factor 1: data sensitivity¶
This is the first question to answer, because it rules some options out.
- Non-sensitive or already-public data. Any hosted provider is fine. Only the SQL and sampled result rows leave your machine; datasight does not upload raw files.
- Sensitive data where a hosted API is acceptable under a BAA or
enterprise agreement. Use a secure endpoint such as Anthropic on AWS
Bedrock, Azure OpenAI, or a corporate gateway. Configure datasight with
the provider's
base_urlpointing at your endpoint. - Sensitive data that must not traverse the public internet at all. Run a local model via Ollama — on your laptop, or on an HPC GPU node (see below).
Note: even with a hosted API, the data values that reach the LLM are limited to column names, schema descriptions, example queries, and small result samples used for summarization. Full tables are never uploaded. That said, column names and sample rows can themselves be sensitive, so treat them accordingly.
Factor 2: cost¶
For the hosted options, rough order of magnitude (check current pricing — these move):
- GitHub Models — free for a generous monthly quota, rate-limited. Great for evaluation and light use. Provides access to GPT, Llama, and other open models through a single GitHub token. Note: the free tier caps requests at 8,000 tokens, which is easy to exceed on databases with many tables or wide tables. If you hit context-length errors, see Limit schema sent to the LLM.
- Cheap hosted tier (Anthropic Haiku, OpenAI GPT-4o-mini / GPT-4.1-mini) — typical datasight sessions cost pennies to single-digit cents.
- Mid hosted tier (Anthropic Sonnet, OpenAI GPT-4o) — roughly 5× the cheap tier. Noticeably better at ambiguous questions and multi-step reasoning.
- Top hosted tier (Anthropic Opus, OpenAI's largest model) — roughly 5× the mid tier. Rarely needed for datasight's workload. If the mid tier is struggling on your schema, better schema descriptions and example queries usually help more than jumping a tier.
A practical starting rule: use the cheap tier until you can point to specific questions it gets wrong, then try the mid tier on just those.
Factor 3: local models with Ollama¶
Local models cost nothing per query, keep data on your hardware, and work offline — at the price of needing GPU memory and slower inference than hosted APIs.
Sizing rule of thumb¶
VRAM needed ≈ model parameter count × bytes per parameter, plus some overhead for context.
- 4-bit quantized (Ollama default): ~0.5 GB per billion parameters
- 8-bit: ~1 GB per billion parameters
- 16-bit (fp16): ~2 GB per billion parameters
So a Llama 3.1 8B model fits in ~5 GB VRAM at 4-bit, a 70B model needs ~40 GB, and a 405B model needs ~200+ GB.
On a laptop¶
| Laptop hardware | What fits comfortably |
|---|---|
| Apple Silicon with 16 GB unified memory | 7–8B models at 4-bit |
| Apple Silicon with 32 GB | 13B at 4-bit, or 8B at 8-bit |
| Apple Silicon with 64 GB+ | 34–70B at 4-bit |
| NVIDIA laptop GPU, 8 GB VRAM | 7–8B at 4-bit |
| NVIDIA laptop GPU, 16 GB VRAM | 13B at 4-bit |
For datasight's SQL-generation workload, qwen2.5:7b is the recommended
starting point for CLI queries (datasight ask). For the web UI with
visualizations, step up to qwen2.5:14b — the 7B model struggles with
the more complex multi-step agent interactions required for chart
generation. Smaller models often struggle with realistic schemas.
On an HPC GPU node¶
If your HPC has GPU nodes, they typically unlock much larger models.
NLR Kestrel as a concrete example. Kestrel has 156 GPU nodes, each with 4 NVIDIA H100 SXM GPUs (80 GB VRAM each, 320 GB per node) and 384–1536 GB system RAM. On a single Kestrel GPU node you can run:
- Llama 3.1 70B at fp16 (~140 GB) with headroom to spare
- Llama 3.1 405B at 4-bit quantization (~200 GB) across the 4 GPUs
- Multiple mid-sized models concurrently
Kestrel's debug partition lets you request up to half a GPU node for
4 hours without a large allocation — a practical way to try local
models before committing resources.
See Run on an HPC compute node for the deployment pattern (datasight runs on the compute node, tunnels back to your laptop browser).
When hosted beats local¶
A hosted cheap-tier call (Haiku or GPT-4o-mini) often produces better SQL than a locally-run 8B model, at a fraction of a cent. GitHub Models offers a free tier that handles the full datasight feature set — including visualizations — better than most local models. Don't reach for local models just to avoid hosted costs — reach for them when data sensitivity or offline use requires it.
Factor 4: where the LLM call originates¶
datasight makes its LLM calls from wherever the datasight process is running. That matters when you're combining a remote data backend with any kind of policy or network constraint:
- datasight on your laptop + local data — LLM call from laptop.
- datasight on an HPC compute node — LLM call from the compute node. Good fit if you want to use the compute node's GPU for a local model, or if hosted API keys are configured there.
- datasight on your laptop + remote Flight SQL backend on HPC — LLM call from laptop, SQL executed on HPC. Good fit if your laptop has the GPU you want to use, or if compute-node egress to hosted APIs is blocked.
See the two HPC how-tos for the tradeoffs: Run on an HPC compute node and Connect to a remote Flight SQL backend.
Configuring your choice¶
Once you've picked a provider, see the Install and configure an LLM how-to for the exact environment variables. The short version:
# Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-...
# OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# GitHub Models
LLM_PROVIDER=github
GITHUB_TOKEN=ghp-...
# Ollama (local — use for cost/data-security reasons)
LLM_PROVIDER=ollama
OLLAMA_MODEL=qwen2.5:7b # CLI queries; use qwen2.5:14b for web UI with viz
A secure hosted endpoint (Bedrock, Azure OpenAI, corporate proxy) is
configured by setting ANTHROPIC_BASE_URL or OPENAI_BASE_URL alongside
the credentials.