For Azure Data Factory teams

Your pipeline says success. Your data says otherwise.

Catch the missing rows and broken numbers ADF can't see — without a single row leaving your Azure. General Validation recomputes source against target inside your own environment and hands back run evidence you can put in front of anyone.

Request early access Book a demo call →

Onboarding a handful of design-partner teams this quarter. Your data never leaves your Azure tenant.

Validation Tests orders → orders_dw

Validation tests for pair orders to orders_dw. Two checks passed — OUTER_VALUE on order_id and COUNT_ROWS. The VALUE check on amount failed: 8 of 4,812 rows differ, including order 4471 where source is 120.00 and target is 90.00, a difference of minus 30.00.
Status	Function	Field A	Op	Field B
PASS	OUTER_VALUE	order_id	=	order_id
PASS	COUNT_ROWS	—	>	—
FAIL	VALUE	amount	=	amount

Result Detail · VALUE amount 8 of 4,812 differ

Key Source → Target Δ

#4471 120.0090.00 −30.00
#4472 75.5075.05 −0.45
+6 more →

metadata only · runs in your Azure · evidence in your storage

Runs inside your own Azure

Metadata & results only — never your rows

One reusable ADF pipeline

The problem

The pipeline went green. The number was wrong.

As long as the final metric looks positive, it ships — a thumbs-up, no further review.

Rows drop, types coerce silently, late files land half-written, and the run still reports green. The one engineer who asks “is this data actually right?” ends up with a target on their back — the question hands responsibility back to the business, so it's easier if nobody asks. The number that's off by two percent rides the dashboard for a week. And when someone finally recreates the total and lands on a different answer, dropping the conversation is easier than having it.

General Validation recreates the number for you — source against target, every run — and hands back evidence no one can wave away. The question stops being yours to carry. The data answers it.

Request early access

The same run, two truths

PASS pipeline orchestration ran clean Succeeded

FAIL row_count 2 rows never landed Δ −2

ADF reported the first line. Only a data check reports the second.

How it works

Discover, define, validate.

Three steps, no agent in your data path. The product reads metadata and orchestrates; your Azure does the reading and writing.

01
Discover

Connect your Data Factory. We read metadata only — datasets, linked services, pipelines, and schemas — never your row data.
02
Define

Pair a source and target, then declare the checks that matter: counts, sums, distinct values, value matches, set membership.
03
Validate

Checks compile into one reusable ADF Mapping Data Flow and run inside your Azure. Only results and diagnostics come back.

Why teams trust the result

It tells a broken pipeline apart from bad data.

Every run moves through four stages, so an infrastructure failure is never confused with a data failure. When something's wrong, you know exactly which — and where to look.

01
Preflight

Resources and inputs are checked before anything runs.
02
ADF execution

Your Azure reads the source and target and runs the checks.
03
Delta ingestion

Results land in your own Delta storage and are read back.
04
Validation outcome

Pass or fail on the data itself — the answer you came for.

Scope — today

What you can validate today.

Flat tabular data across the formats Azure Data Factory reads natively. This is what runs end-to-end right now, not a roadmap.

Runnable formats

Pair any two runnable datasets as source and target. ADF reads both sides in your Azure environment; General Validation stores result metadata, not source or target rows.

Runs end-to-end today

Parquet

Columnar files in ADF-managed storage paths.
CSV / delimited text

Flat delimited datasets with schema discovery.
Delta Lake

Lakehouse tables read and written inside Azure.
Azure SQL

Azure SQL Database, MI, and Synapse SQL.

Twelve validation checks

Grouped by the question they answer: are the records present, do the measures agree, do joined values match, and do sets reconcile?

tolerances · casts · evidence

Completeness

Prove the expected records actually landed.

row_count
Total rows, source vs target
count
Count of a chosen field
distinct_count
Distinct values of a field

Aggregates

Catch numeric, date, and timestamp drift.

sum
Sum of a numeric field
avg
Average of a numeric field
min
Minimum value
max
Maximum value

Row Values

Compare records across joined pairs.

value
Row-level value match across a join
outer_value
Value match, counting unmatched rows

Sets

Check membership and equality of column sets.

A ⊆ B
Set A contained in B
B ⊆ A
Set B contained in A
A = B
Set equality

Numeric aggregates support tolerances.

Date, timestamp, and string checks are exact-match.

Casts are opt-in and validated before a run.

Request early access

The evidence

Every run leaves a record you can defend.

Not another green checkmark — a small, repeatable record of what was compared, which rule failed, how far it drifted, and where to inspect the evidence.

When someone says the number's fine and you're not sure, you don't argue — you show the run.

Per-test results: source, target, delta, threshold, and PASS / FAIL
Run history with the four-stage lifecycle on every run
Bad-record drill-through, read on demand from your own storage
Reporting dashboard, per-test history, and CSV export

Validation run run_2f9c… for pair orders → orders_dw. Row-count check failed: source 1,000 versus target 998, delta minus 2. Sum, distinct-count, and max checks passed.
VALIDATION RUN · run_2f9c… pair: orders → orders_dw
Test	Source	Target	Status
row_count	1,000	998	FAIL Δ −2
sum(amount)	48,210.55	48,210.55	PASS
distinct(id)	1,000	1,000	PASS
max(updated)	2026-06-06	2026-06-06	PASS

metadata only · runs in your Azure · evidence kept in your storage

Security & privacy

Your security team already said yes.

We store validation metadata and results — schemas, run status, metrics, diagnostics, and pointers to evidence — not your source or target rows. The data stays inside your Azure boundary, where it already lives.

Customer-owned Azure

It runs where your data already lives.

The application, worker, runtime, storage, secrets, and evidence all sit inside infrastructure your team controls.

No vendor data lake

Your rows never leave your tenant.

Source rows, target rows, validation outputs, and bad-record evidence stay in your Azure. We never copy them out.

Enterprise controls

The controls your security team expects.

OIDC / SSO sign-in, role-based access, tenant isolation, and audit logging — supported and tested, not a compliance badge.

Private beta

We're onboarding a limited number of design-partner teams this quarter — chosen for real Azure Data Factory workloads, not logos. Early partners get priority support and a direct line into the roadmap.

Get started

Be first to run it on your pipelines.

Request access, or book a call and we'll walk your Azure Data Factory workload together — what to validate first, how it runs in your environment, and what your team gets back.

Request early access Book a demo call →

Private beta · metadata only · runs in your Azure.

Request early access

Your pipeline says success. Your data says otherwise.

The pipeline went green. The number was wrong.

Discover, define, validate.

Discover

Define

Validate

It tells a broken pipeline apart from bad data.

Preflight

ADF execution

Delta ingestion

Validation outcome

What you can validate today.

Every run leaves a record you can defend.

Your security team already said yes.

It runs where your data already lives.

Your rows never leave your tenant.

The controls your security team expects.

Be first to run it on your pipelines.