Data quality reports

Docs Workbench

Data quality reports

What ST-1 flags about a source, and how to bake fixes into a DBT.

Every uploaded source gets profiled: type per column, null counts, distinct counts, ranges. The Qualitypanel on a source’s detail page surfaces the things that usually need attention before downstream analysis is honest.

What gets flagged

  • Mostly-null columns — ≥80% of rows missing, where most aggregates would mislead.
  • Whitespace — leading or trailing space in string values (every join and group-by has to TRIM first or it silently splits).
  • Case variants— distinct values that collapse meaningfully when case-folded (e.g. “USA” / “usa” / “Usa”). Auto-fix only when the column is clearly categorical; for high-cardinality names and addresses the panel surfaces the finding but leaves the canonicalization to you.
  • Empty strings — empty string values that should almost always be NULL.
  • Strings that should be numeric — VARCHAR column where ≥90% of values parse as numbers (type lost on ingest).
  • Strings that should be dates — VARCHAR column where ≥90% of values match an ISO-ish date pattern.
  • Integer-valued floatsDOUBLE columns where every value is whole, suggesting BIGINT would be lossless and clearer.
  • Candidate keys — non-null columns where every row is distinct (likely a primary key worth marking as such).

Each fixable flag carries a fragment

Most issues come with a SQL fragment — a TRY_CAST for the type fixes, a TRIM for whitespace, a NULLIF(…, '') for empty strings. The panel composes them into a single SELECT * REPLACE (…) FROM <source> so the cleanup is a one-line DBT.

One-click cleanup DBT

Click Create cleanup DBT and ST-1 saves a new DBT named <source>_clean with the suggested fragments applied. From then on, downstream analysis reads from the clean version; the source stays untouched. The Quality panel detects an existing _clean sibling and links to it instead of offering the button a second time.

From the AI side

The quality_report MCP tool returns the same flags the panel shows, plus a suggested_dbt_sql field — so an AI can read the issues, decide what to keep, and call create_dbt with an edited version.