Skip to content

For Azure Data Factory teams

Your pipeline says success. Your data says otherwise.

Catch the missing rows and broken numbers ADF can't see — without a single row leaving your Azure. General Validation recomputes source against target inside your own environment and hands back run evidence you can put in front of anyone.

Onboarding a handful of design-partner teams this quarter. Your data never leaves your Azure tenant.

Validation Tests orders → orders_dw
Validation tests for pair orders to orders_dw. Two checks passed — OUTER_VALUE on order_id and COUNT_ROWS. The VALUE check on amount failed: 8 of 4,812 rows differ, including order 4471 where source is 120.00 and target is 90.00, a difference of minus 30.00.
Status Function Field A
PASS OUTER_VALUE order_id
PASS COUNT_ROWS
FAIL VALUE amount
Result Detail 8 of 4,812 differ
Key Source → Target Δ
  • #4471 120.0090.00 −30.00
  • #4472 75.5075.05 −0.45
  • +6 more →

metadata only · runs in your Azure · evidence in your storage

Runs inside your own Azure

Metadata & results only — never your rows

One reusable ADF pipeline

The problem

The pipeline went green. The number was wrong.

As long as the final metric looks positive, it ships — a thumbs-up, no further review.

Rows drop, types coerce silently, late files land half-written, and the run still reports green. The one engineer who asks “is this data actually right?” ends up with a target on their back — the question hands responsibility back to the business, so it's easier if nobody asks. The number that's off by two percent rides the dashboard for a week. And when someone finally recreates the total and lands on a different answer, dropping the conversation is easier than having it.

General Validation recreates the number for you — source against target, every run — and hands back evidence no one can wave away. The question stops being yours to carry. The data answers it.

The same run, two truths

PASS pipeline orchestration ran clean Succeeded
FAIL row_count 2 rows never landed Δ −2

ADF reported the first line. Only a data check reports the second.

How it works

Discover, define, validate.

Three steps, no agent in your data path. The product reads metadata and orchestrates; your Azure does the reading and writing.

  1. 01

    Discover

    Connect your Data Factory. We read metadata only — datasets, linked services, pipelines, and schemas — never your row data.

  2. 02

    Define

    Pair a source and target, then declare the checks that matter: counts, sums, distinct values, value matches, set membership.

  3. 03

    Validate

    Checks compile into one reusable ADF Mapping Data Flow and run inside your Azure. Only results and diagnostics come back.

Why teams trust the result

It tells a broken pipeline apart from bad data.

Every run moves through four stages, so an infrastructure failure is never confused with a data failure. When something's wrong, you know exactly which — and where to look.

  1. 01

    Preflight

    Resources and inputs are checked before anything runs.

  2. 02

    ADF execution

    Your Azure reads the source and target and runs the checks.

  3. 03

    Delta ingestion

    Results land in your own Delta storage and are read back.

  4. 04

    Validation outcome

    Pass or fail on the data itself — the answer you came for.

Scope — today

What you can validate today.

Flat tabular data across the formats Azure Data Factory reads natively. This is what runs end-to-end right now, not a roadmap.

Runnable formats

Pair any two runnable datasets as source and target. ADF reads both sides in your Azure environment; General Validation stores result metadata, not source or target rows.

Runs end-to-end today
  • Parquet

    Columnar files in ADF-managed storage paths.

  • CSV / delimited text

    Flat delimited datasets with schema discovery.

  • Delta Lake

    Lakehouse tables read and written inside Azure.

  • Azure SQL

    Azure SQL Database, MI, and Synapse SQL.

Twelve validation checks

Grouped by the question they answer: are the records present, do the measures agree, do joined values match, and do sets reconcile?

tolerances · casts · evidence

Completeness

Prove the expected records actually landed.

  • row_count

    Total rows, source vs target

  • count

    Count of a chosen field

  • distinct_count

    Distinct values of a field

Aggregates

Catch numeric, date, and timestamp drift.

  • sum

    Sum of a numeric field

  • avg

    Average of a numeric field

  • min

    Minimum value

  • max

    Maximum value

Row Values

Compare records across joined pairs.

  • value

    Row-level value match across a join

  • outer_value

    Value match, counting unmatched rows

Sets

Check membership and equality of column sets.

  • A ⊆ B

    Set A contained in B

  • B ⊆ A

    Set B contained in A

  • A = B

    Set equality

Numeric aggregates support tolerances.

Date, timestamp, and string checks are exact-match.

Casts are opt-in and validated before a run.

The evidence

Every run leaves a record you can defend.

Not another green checkmark — a small, repeatable record of what was compared, which rule failed, how far it drifted, and where to inspect the evidence.

When someone says the number's fine and you're not sure, you don't argue — you show the run.

  • Per-test results: source, target, delta, threshold, and PASS / FAIL
  • Run history with the four-stage lifecycle on every run
  • Bad-record drill-through, read on demand from your own storage
  • Reporting dashboard, per-test history, and CSV export
Validation run run_2f9c… for pair orders → orders_dw. Row-count check failed: source 1,000 versus target 998, delta minus 2. Sum, distinct-count, and max checks passed.
VALIDATION RUN · run_2f9c… pair: orders → orders_dw
Test Status
row_count FAIL Δ −2
sum(amount) PASS
distinct(id) PASS
max(updated) PASS
metadata only · runs in your Azure · evidence kept in your storage

Security & privacy

Your security team already said yes.

We store validation metadata and results — schemas, run status, metrics, diagnostics, and pointers to evidence — not your source or target rows. The data stays inside your Azure boundary, where it already lives.

Customer-owned Azure

It runs where your data already lives.

The application, worker, runtime, storage, secrets, and evidence all sit inside infrastructure your team controls.

No vendor data lake

Your rows never leave your tenant.

Source rows, target rows, validation outputs, and bad-record evidence stay in your Azure. We never copy them out.

Enterprise controls

The controls your security team expects.

OIDC / SSO sign-in, role-based access, tenant isolation, and audit logging — supported and tested, not a compliance badge.

Private beta

We're onboarding a limited number of design-partner teams this quarter — chosen for real Azure Data Factory workloads, not logos. Early partners get priority support and a direct line into the roadmap.

Get started

Be first to run it on your pipelines.

Request access, or book a call and we'll walk your Azure Data Factory workload together — what to validate first, how it runs in your environment, and what your team gets back.

Private beta · metadata only · runs in your Azure.

Request early access