A financial organization ingests data from multiple SQL Server systems and flat files. Data arrives with inconsistencies, schema drift, and no automated quality checks or anomaly detection. This demonstrates an end-to-end pipeline that builds trust in.
I demonstrate a staging → validation → warehouse → semantic-layer pipeline orchestrated with Python and Azure Data Factory. Logging tables track row counts, null checks, referential integrity, and schema drift; an AI anomaly-detection step (Isolation Forest / statistical outliers) flags unusual patterns; and a governance dashboard consumes those logs to show quality trends, lineage, and SLA compliance.
A financial organization ingests data from multiple SQL Server systems and flat files. Data arrives with inconsistencies, schema drift, and no automated quality checks or anomaly detection. Bad rows reach reporting before anyone notices, and QA is manual and reactive.
Every stage writes to logging tables, and the validation + anomaly gate sits between the warehouse and the semantic layer so only trusted data is published.
An Isolation Forest scores each batch for outliers; flagged rows are quarantined and surfaced to the governance dashboard rather than silently loaded.
from sklearn.ensemble import IsolationForest import pandas as pd def score_batch(df: pd.DataFrame, features: list) -> pd.DataFrame: """Flag statistical outliers before load. Conservative contamination.""" model = IsolationForest( n_estimators=200, contamination=0.01, random_state=42, ) scores = model.fit_predict(df[features]) df["is_anomaly"] = (scores == -1) return df # flagged rows are written to dq.AnomalyLog and moved to a quarantine table
The pipeline scores four dimensions plus an anomaly rate; the composite feeds the governance dashboard and the SLA report (values illustrative).
A single pane consumes pipeline logs and anomaly output to show quality trends and SLA compliance. Filter by source, hover the charts for values, or select a source to drill through to the anomaly log. Illustrative data.
| Row Key | Column | Expected | Actual | Score | Status |
|---|
The example system combines pipeline engineering with governance:
This example shows how combining ETL + quality + governance can, in practice: