Overview
What is a Data Contract?
A data contract is a promise made by a data producer towards data consumers. The latter accepts this promise, transforming it into a contract.
Imagine you're ordering a pizza. You're the data consumer and the pizza place is the data producer. A data contract is like the menu:
- it tells you what kind of pizzas (data) are available
- it lists the toppings and sizes (data properties)
- it might mention things like "fresh ingredients", "gluten free", or "30-minute delivery" (data guarantees)
When you place your order, you're agreeing to a specific pizza and size (data subscription). The pizza place guarantees they'll make the pizza according to the menu (Service Level Agreement - SLA). This way, you know what to expect and they know what to deliver.
Data contracts work similarly but for data instead of pizza. They establish a clear agreement between those who provide data (data producers) and those who use it (data consumers). This makes data exchange more reliable and predictable. Here's a breakdown in simpler terms:
- Data: the information being exchanged
Example: daily transaction records from credit card processing - Data Producer: prepares and provides the data
Example: the payments department within a bank, responsible for processing and sharing transaction data with other teams - Data Consumer: who wants to use the data and thus subscribes to the data contract
Example: the risk management team, which uses transaction data to monitor for unusual activity and detect potential fraud - Data Properties: these are details about the data, like its schema, structure, and quality
Example: the transaction data schema includes fields such astransaction_id
(string),amount
(decimal),transaction_date
(date),merchant_category
(string), andcustomer_id
(string). Data quality checks specify thatamount
must be positive,transaction_date
should be within the last 24 hours, andmerchant_category
must match a predefined list - SLA: a formal set of guarantees that outline the level of service you can expect
Example: transaction data will be delivered within 15 minutes after each hour. 99.9% availability. Less than 0.5% missing or inaccurate records per month. Response time of 30 minutes for any data delivery issues.
A data contract helps ensure that both the data provider and the data consumer are on the same page regarding what data is being shared, how it can be accessed, and what quality standards it should meet. It brings clarity and trust to the process of exchanging data.
Data Contract Guardians
A data contract guardian is a controller (typically an automated system) that oversees data flows to ensure the data contract is consistently met.
If the guardian detects any contract violations — like missing fields or delayed delivery — it may trigger alerts, reject the data, or take corrective actions to keep data reliable for consumers.
Data Contracts in Witboost
Witboost provides comprehensive capabilities to:
- define, validate, and seamlessly manage the evolution of data contracts
- deploy data contracts and their guardians on the target infrastructure
- gather monitoring insights and alerts from data contract guardians, fully integrating with the Computational Governance Platform
- instantly notify data producers and consumers of any data contract violations
- track data contract status and issues within the Witboost Marketplace
The following sections of the documentation will delve deeper into each topic, using a sample data contract as a guiding reference:
The diagram represents a data flow setup between a Producer System and one or more consumers with a data contract governing the process.
Producer System:
- Data Ingestion Workload: periodically ingests data from a source system (e.g., CDC on a transactional database, ETL pipeline, ...) and produces messages for the landing topic
- Data Contract:
- Landing Topic: message queue, entry point for the data contract
- Data Contract Guardian: a workload that assesses each message against the data contract criteria and routes it based on compliance:
- Non-Compliant Topic: stores messages that do not meet contract standards
- Compliant Topic: stores messages that pass contract validation and makes them available for consumption
The data contract ensures that only compliant data reaches the consumers, while non-compliant data is isolated, creating a controlled and validated data pipeline.