What the AI sees¶
datasight sends information to an LLM to translate your question into SQL. This page explains exactly what is and isn't included in those calls.
What datasight sends¶
| Item | Why it's sent |
|---|---|
| Table and column names | The AI needs the schema to write correct SQL |
| Column data types | Helps the AI pick appropriate aggregations and filters |
| Row counts per table | Helps the AI reason about the size and shape of the data |
Your schema_description.md |
The plain-English context you wrote to explain your data |
Your queries.yaml examples |
Shows the AI the correct SQL patterns for your schema |
| Small result samples | Used to generate a plain-English answer after the query runs (typically a few rows) |
| Your natural-language question | The text you typed in the chat input |
| The current conversation history | Prior questions, generated SQL, and the AI's prior responses are re-sent on every follow-up turn so the model has context |
Conversation history accumulates
Each follow-up turn re-sends the full chat. If a question or its result happens to
contain something sensitive, that content stays in context until you start a new
chat (New Chat in the header, or the N keyboard shortcut).
What datasight does NOT send¶
- Full table contents. Raw data values are never uploaded. Only the small result-row samples used to summarize an answer reach the LLM.
- Your
.envfile or API keys. Configuration is read locally and never transmitted. - Raw files. If you load a CSV or Parquet file, the file stays on your machine. The AI only sees the inferred column names and types.
- Filesystem paths beyond the table and column names introspected from the database.
- Other files or directories outside the open project.
Column names and samples can still be sensitive¶
Even though full data is never sent, column names like patient_id or salary_usd and
sampled result rows may themselves be sensitive. Treat the schema description and any
example queries you write as data-sensitivity artifacts — the AI will see them.
Hosted APIs vs. local models¶
When datasight calls a hosted API (Anthropic, OpenAI, GitHub Models), the schema information and result samples leave your machine over the internet.
If your data sensitivity requirements prohibit that, two options keep everything local:
- Local Ollama — runs a model on your hardware; nothing leaves the machine.
- Secure hosted endpoint — Anthropic on AWS Bedrock, Azure OpenAI, or a corporate gateway with a data-processing agreement.
See Choosing an AI provider for guidance on which option fits your situation.