Data Quality Contract Practice
A data team collaboration method that manages schema changes through contracts to prevent downstream failures

Introduction
Although data quality issues may seem like analysis team issues, they are actually directly related to service reliability. If event schemas are changed without a contract, dashboard numbers and operational decisions will be distorted. This article deals with how to establish the Data Contract as an operating protocol between teams.

Problem definition
In organizations without data contracts, the same event is interpreted with different meanings.
- Omission of required fields is discovered late in the deployment stage.
- There is no schema version management, so producer changes lead to consumer failures.
- Data freshness standards vary from team to team, so alert reliability is low.
Data quality is a responsibility boundary that comes before inspection tools. Producer and consumer must share the same contract document.
Key concepts
| perspective | Design criteria | Verification points |
|---|---|---|
| schema | Required fields/types/version definitions | breaking change detection |
| verification | Quality gate upon ingestion | invalid row rate |
| Freshness | Specify allowable delay section | freshness SLA compliance rate |
| Ownership | Data owner by domain | Issue response time |
A data contract will fail if it ends as a document. Contract verification must be built into CI and pipelines to be maintained in real-world operations.
Code example 1: Data contract definition
name: order_created
version: 1.2.0
owner: commerce-platform
required_fields:
- event_id
- occurred_at
- user_id
- order_id
field_types:
event_id: string
occurred_at: datetime
user_id: string
order_id: string
freshness_sla_minutes: 10
Code example 2: ingestion validation query
SELECT
COUNT(*) AS total_rows,
SUM(CASE WHEN event_id IS NULL THEN 1 ELSE 0 END) AS missing_event_id,
SUM(CASE WHEN occurred_at IS NULL THEN 1 ELSE 0 END) AS missing_occurred_at,
SUM(CASE WHEN occurred_at < NOW() - INTERVAL '1 day' THEN 1 ELSE 0 END) AS stale_rows
FROM raw_order_created
WHERE ingestion_date = CURRENT_DATE;
Architecture flow
Tradeoffs
- Strict contracts ensure quality but limit producer development speed to some extent.
- Increasing the verification stage is stable, but ingestion delay may increase.
- Subdividing version management is flexible, but increases operational complexity.
Cleanup
When it comes to data quality, contract culture is more important than technology introduction. By specifying schema, validation, and ownership, and connecting automatic validation, you can continuously maintain metric reliability.
Image source
- Cover: source link
- License: CC BY-SA 4.0 / Author: Sage (Wiki Ed)
- Note: After downloading the free license image from Wikimedia Commons, it was optimized to JPG at 1600px.