Docs Scale

Scale and sampling

Where ST-1 keeps interaction snappy, where it samples for preview, and where it pushes work to the server.

ST-1 has three layers, sized to keep interaction in a single frame and the canonical data honest. Each layer kicks in based on the source’s row count.

Up to ~2M rows — sub-frame interactive

Mosaic precomputes column-summary cubes inside DuckDB-WASM in your browser. Drag a histogram bar; the table, every other column summary, and any chart on top all repaint in the same frame. No round-trips, no streaming spinner.

2M – 50M rows — auto-sampled preview

Past the threshold, sources materialize as a 500,000-row reservoir sample for the in-browser preview. Every panel reading from the sample carries a sampledbadge so you always know what you’re looking at. The canonical Parquet on the server is untouched; agents calling query via MCP read the canonical data, not the sample.

The toggle lives in the project sidebar (Sample large sources). Turn it off when you need exact counts in the preview; expect the page to feel slow on big sources.

50M+ rows — server-side compute

The browser is no longer the right place for the heavy lift. Run aggregates through the MCP querytool — your AI gets canonical data — or author a derived rollup that’s small enough to explore directly. The browser sees the result; the heavy work stays where the data lives. Upload limit is 1 GiB per source.

The sampling toggle in detail

What samples — only sources over 2M rows. Smaller sources and every DBT preview always read full data.
How — a deterministic reservoir sample so re-opens see the same preview rows.
What doesn’t sample — server-side MCP calls (query, sample, quality_report) and server-rendered chart PNGs read the canonical data.