Inspect files¶
You have CSV, Parquet, or DuckDB files and want to understand what's in them. datasight can profile the data, surface measures and dimensions, suggest trends, and generate starter prompts — all without setting up a project or calling an LLM.
One command: datasight inspect¶
datasight inspect runs every deterministic analysis in one shot:
datasight inspect generation.parquet
datasight inspect generation.csv plants.csv
datasight inspect data_dir/
This prints:
- Profile — table and column counts, row counts, largest tables, date coverage
- Quality — null-heavy columns, suspicious numeric ranges, notes
- Measures — inferred metrics with roles, default aggregations, rollup SQL
- Dimensions — grouping candidates with distinct counts and sample values
- Trends — date/measure pairs with chart recommendations
- Recipes — suggested prompts for deeper exploration
Nothing is written to disk. Use --format json or --format markdown to
change the output, or -o report.md to save it to a file:
datasight inspect generation.parquet --format json
datasight inspect generation.parquet --format markdown -o overview.md
Explore files in the web UI¶
If you prefer a visual interface, the web UI can also work with files directly — no project setup required:
Open http://localhost:8084. The landing page shows:
-
Guided starters — choose a workflow like Profile this dataset, Find key dimensions, Build a trend chart, or Audit nulls and outliers. datasight runs the selected starter as soon as your data loads.
-
Configure your LLM — if needed, enter your provider and API key. (If you exported
ANTHROPIC_API_KEYor similar in your shell, this step is skipped.) -
Explore Files — enter the path to your CSV, Parquet, or DuckDB file (or a directory of Parquet files) and click Explore.
datasight creates an in-memory database, introspects the schema, and drops you into the chat UI.
Adding more files
Use the input at the top of the sidebar (below Tables) to add more files to your session at any time.
Save as a project¶
Once you're comfortable with your data, click Save in the header to persist your session as a project. datasight will:
- Create a project directory with a DuckDB database (views pointing to your original files — no data copying)
- Auto-generate
schema_description.mdandqueries.yamlusing the LLM - Seed a
measures.yamlscaffold from the inferred semantic measures
See Set up a project for the full project workflow.
Generate project files from files¶
datasight generate can also work directly with files to auto-generate
schema documentation:
Examples:
# Reference an existing DuckDB database directly
datasight generate generation.duckdb
# Reference an existing SQLite database directly
datasight generate generation.sqlite
# Create ./database.duckdb from CSV inputs
datasight generate generation.csv plants.csv
# Create ./database.duckdb from Parquet inputs
datasight generate generation.parquet plants.parquet
# Create a custom project DuckDB from CSV inputs
datasight generate generation.csv plants.csv --db-path db/project.duckdb
# Create a custom project DuckDB from Parquet inputs
datasight generate generation.parquet plants.parquet --db-path db/project.duckdb
--db-path is an output path. Use it only when datasight is creating a
DuckDB project database from CSV, Parquet, or mixed file inputs. Do not
use --db-path with a single existing DuckDB or SQLite database; those
files are referenced directly in .env.
This creates schema_description.md, queries.yaml, measures.yaml,
and time_series.yaml in the current directory using the LLM. See
Set up a project for details.
Supported file types¶
| Type | Example | How it's handled |
|---|---|---|
| CSV | data.csv |
Loaded via DuckDB's read_csv_auto |
| Parquet | data.parquet |
Loaded via DuckDB's read_parquet |
| DuckDB | data.duckdb |
Referenced directly when it is the only input |
| SQLite | data.sqlite |
Referenced directly when it is the only input |
| Parquet directory | data_dir/ |
Hive-partitioned parquet with read_parquet glob |
| CSV directory | data_dir/ |
All CSVs loaded via read_csv_auto glob |
For CSV and Parquet inputs, each file becomes a view in an ephemeral
in-memory DuckDB database while documentation is generated. The view name
is derived from the filename (e.g. generation.parquet becomes the
generation table).