Skip to content

Inspect files

You have CSV, Parquet, or DuckDB files and want to understand what's in them. datasight can profile the data, surface measures and dimensions, suggest trends, and generate starter prompts — all without setting up a project or calling an LLM.

One command: datasight inspect

datasight inspect runs every deterministic analysis in one shot:

datasight inspect generation.parquet
datasight inspect generation.csv plants.csv
datasight inspect data_dir/

This prints:

  • Profile — table and column counts, row counts, largest tables, date coverage
  • Quality — null-heavy columns, suspicious numeric ranges, notes
  • Measures — inferred metrics with roles, default aggregations, rollup SQL
  • Dimensions — grouping candidates with distinct counts and sample values
  • Trends — date/measure pairs with chart recommendations
  • Recipes — suggested prompts for deeper exploration

Nothing is written to disk. Use --format json or --format markdown to change the output, or -o report.md to save it to a file:

datasight inspect generation.parquet --format json
datasight inspect generation.parquet --format markdown -o overview.md

Explore files in the web UI

If you prefer a visual interface, the web UI can also work with files directly — no project setup required:

datasight run

Open http://localhost:8084. The landing page shows:

  1. Guided starters — choose a workflow like Profile this dataset, Find key dimensions, Build a trend chart, or Audit nulls and outliers. datasight runs the selected starter as soon as your data loads.

  2. Configure your LLM — if needed, enter your provider and API key. (If you exported ANTHROPIC_API_KEY or similar in your shell, this step is skipped.)

  3. Explore Files — enter the path to your CSV, Parquet, or DuckDB file (or a directory of Parquet files) and click Explore.

datasight creates an in-memory database, introspects the schema, and drops you into the chat UI.

Adding more files

Use the input at the top of the sidebar (below Tables) to add more files to your session at any time.

Save as a project

Once you're comfortable with your data, click Save in the header to persist your session as a project. datasight will:

  • Create a project directory with a DuckDB database (views pointing to your original files — no data copying)
  • Auto-generate schema_description.md and queries.yaml using the LLM
  • Seed a measures.yaml scaffold from the inferred semantic measures

See Set up a project for the full project workflow.

Generate project files from files

datasight generate can also work directly with files to auto-generate schema documentation:

datasight generate generation.parquet plants.csv

Examples:

# Reference an existing DuckDB database directly
datasight generate generation.duckdb

# Reference an existing SQLite database directly
datasight generate generation.sqlite

# Create ./database.duckdb from CSV inputs
datasight generate generation.csv plants.csv

# Create ./database.duckdb from Parquet inputs
datasight generate generation.parquet plants.parquet

# Create a custom project DuckDB from CSV inputs
datasight generate generation.csv plants.csv --db-path db/project.duckdb

# Create a custom project DuckDB from Parquet inputs
datasight generate generation.parquet plants.parquet --db-path db/project.duckdb

--db-path is an output path. Use it only when datasight is creating a DuckDB project database from CSV, Parquet, or mixed file inputs. Do not use --db-path with a single existing DuckDB or SQLite database; those files are referenced directly in .env.

This creates schema_description.md, queries.yaml, measures.yaml, and time_series.yaml in the current directory using the LLM. See Set up a project for details.

Supported file types

Type Example How it's handled
CSV data.csv Loaded via DuckDB's read_csv_auto
Parquet data.parquet Loaded via DuckDB's read_parquet
DuckDB data.duckdb Referenced directly when it is the only input
SQLite data.sqlite Referenced directly when it is the only input
Parquet directory data_dir/ Hive-partitioned parquet with read_parquet glob
CSV directory data_dir/ All CSVs loaded via read_csv_auto glob

For CSV and Parquet inputs, each file becomes a view in an ephemeral in-memory DuckDB database while documentation is generated. The view name is derived from the filename (e.g. generation.parquet becomes the generation table).