Explore files without a project¶
You have CSV, Parquet, Excel, or DuckDB files and want to understand what's in them. datasight can profile the data, surface measures and dimensions, suggest trends, and generate starter prompts.
Choose your entry point:
- One-shot text report (no AI needed) →
datasight inspectbelow - Interactive chat UI → Explore in the web UI
- Save as a permanent project → Set up a project
One command: datasight inspect¶
datasight inspect runs every built-in analysis in one shot (no AI call needed):
datasight inspect generation.parquet
datasight inspect generation.csv plants.csv
datasight inspect generation.xlsx
datasight inspect data_dir/
You can pass any mix of file types; they are loaded into an ephemeral DuckDB session for the duration of the command.
This prints:
- Profile — table and column counts, row counts, largest tables, date coverage
- Quality — null-heavy columns, suspicious numeric ranges, notes
- Measures — inferred metrics with roles, default aggregations, rollup SQL
- Dimensions — grouping candidates with distinct counts and sample values
- Trends — date/measure pairs with chart recommendations
- Recipes — suggested prompts for deeper exploration
Nothing is written to disk. Use --format json or --format markdown to
change the output, or -o report.md to save it to a file:
datasight inspect generation.parquet --format json
datasight inspect generation.parquet --format markdown -o overview.md
Explore files in the web UI¶
If you prefer a visual interface, the web UI can also work with files directly — no project setup required:
Open http://localhost:8084. The landing page shows:
-
Guided starters — choose a workflow like Profile this dataset, Find key dimensions, Build a trend chart, or Audit nulls and outliers. datasight runs the selected starter as soon as your data loads.
-
Configure your LLM — if needed, enter your provider and API key. (If you exported
ANTHROPIC_API_KEYor similar in your shell, this step is skipped.) -
Explore Files — enter the path to your CSV, Parquet, Excel, or DuckDB file (or a directory of CSV/Parquet/Excel files) and click Explore.
datasight creates an in-memory database, introspects the schema, and drops you into the chat UI.
Adding more files
Use the input at the top of the sidebar (below Tables) to add more files to your session at any time.
Save as a project¶
Once you're comfortable with your data, click Save in the header to persist your session as a project. datasight will:
- Create a project directory with a DuckDB database (views pointing to your original files — no data copying)
- Auto-generate
schema_description.mdandqueries.yamlusing the LLM - Seed a
measures.yamlscaffold from the inferred semantic measures
See Set up a project for the full project workflow.
Generate project files from files¶
datasight generate can also work directly with files to auto-generate
schema documentation:
Examples:
# Reference an existing DuckDB database directly
datasight generate generation.duckdb
# Reference an existing SQLite database directly
datasight generate generation.sqlite
# Create ./database.duckdb from CSV inputs
datasight generate generation.csv plants.csv
# Create ./database.duckdb from Parquet inputs
datasight generate generation.parquet plants.parquet
# Create ./database.duckdb from Excel inputs (one table per sheet)
datasight generate generation.xlsx plants.xlsx
# Create a custom project DuckDB from CSV inputs
datasight generate generation.csv plants.csv --db-path db/project.duckdb
# Create a custom project DuckDB from Parquet inputs
datasight generate generation.parquet plants.parquet --db-path db/project.duckdb
--db-path is an output path. Use it only when datasight is creating a
DuckDB project database from CSV, Parquet, Excel, or mixed file inputs.
Do not use --db-path with a single existing DuckDB or SQLite database;
those files are referenced directly in .env.
This creates schema_description.md, queries.yaml, measures.yaml,
and time_series.yaml in the current directory using the LLM. See
Set up a project for details.
Supported file types¶
| Type | Example | How it's handled |
|---|---|---|
| CSV | data.csv |
Loaded via DuckDB's read_csv_auto |
| Parquet | data.parquet |
Loaded via DuckDB's read_parquet |
| Excel | data.xlsx |
Each sheet materialized into DuckDB via pandas + openpyxl |
| DuckDB | data.duckdb |
Referenced directly when it is the only input |
| SQLite | data.sqlite |
Referenced directly when it is the only input |
| Parquet directory | data_dir/ |
Hive-partitioned parquet with read_parquet glob |
| CSV directory | data_dir/ |
All CSVs loaded via read_csv_auto glob |
For CSV and Parquet inputs, each file becomes a view in an ephemeral
in-memory DuckDB database. The view name is derived from the filename
(e.g. generation.parquet becomes the generation table).
Excel workbooks¶
DuckDB has no native Excel reader, so Excel workbooks are read through
pandas (with the openpyxl engine) and each sheet is inserted as a full
DuckDB table — not a view.
- A single-sheet workbook produces one table named after the file
(e.g.
plants.xlsx→plants). - A multi-sheet workbook produces one table per sheet, named after
the sheet (e.g. sheets
generationandplants→ tablesgenerationandplants). Sheets whose names already exist in the session get a numeric suffix (generation_2). - Changes to the underlying
.xlsxfile are not picked up automatically — unlike CSV/Parquet views, the data has been copied into DuckDB. Reload the session (or rundatasight generateagain) after editing the workbook. - Data is read in full into memory, so very large workbooks may be slow. For large tabular data, convert to CSV or Parquet first.