Docs Workbench
Data quality reports
What ST-1 flags about a source, and how to bake fixes into a DBT.
Every uploaded source gets profiled: type per column, null counts, distinct counts, ranges. The Qualitypanel on a source’s detail page surfaces the things that usually need attention before downstream analysis is honest.
What gets flagged
- Mostly-null columns — ≥80% of rows missing, where most aggregates would mislead.
- Whitespace — leading or trailing space in string values (every join and group-by has to
TRIMfirst or it silently splits). - Case variants— distinct values that collapse meaningfully when case-folded (e.g. “USA” / “usa” / “Usa”). Auto-fix only when the column is clearly categorical; for high-cardinality names and addresses the panel surfaces the finding but leaves the canonicalization to you.
- Empty strings — empty string values that should almost always be
NULL. - Strings that should be numeric — VARCHAR column where ≥90% of values parse as numbers (type lost on ingest).
- Strings that should be dates — VARCHAR column where ≥90% of values match an ISO-ish date pattern.
- Integer-valued floats —
DOUBLEcolumns where every value is whole, suggestingBIGINTwould be lossless and clearer. - Candidate keys — non-null columns where every row is distinct (likely a primary key worth marking as such).
Each fixable flag carries a fragment
Most issues come with a SQL fragment — a TRY_CAST for the type fixes, a TRIM for whitespace, a NULLIF(…, '') for empty strings. The panel composes them into a single SELECT * REPLACE (…) FROM <source> so the cleanup is a one-line DBT.
One-click cleanup DBT
Click Create cleanup DBT and ST-1 saves a new DBT named <source>_clean with the suggested fragments applied. From then on, downstream analysis reads from the clean version; the source stays untouched. The Quality panel detects an existing _clean sibling and links to it instead of offering the button a second time.
From the AI side
The quality_report MCP tool returns the same flags the panel shows, plus a suggested_dbt_sql field — so an AI can read the issues, decide what to keep, and call create_dbt with an edited version.