Skip to main content

Creating a Data Contract

note

This section is part of a series that uses the sample data contract introduced in the Data Contracts Overview page as a reference

This page explains how to set up the sample data contract described in the Data Contracts Overview page. Your platform team may streamline this process by providing ready-to-use data contract templates that you can instantiate directly from the UI. However, it’s essential to understand the structure of the target system descriptor.

System structure

As shown in the diagram of the sample data contract, the Data Contract is part of a Producer System. Before proceeding with this tutorial, ensure you have:

  • a system to host the data contract
  • a workload component within the system to function as the Data Ingestion Workload
  • a component within the system designated as the Data Contract component, which includes four subcomponents:
    • Landing Topic: a storage component (queue)
    • Non-Compliant Topic: a storage component (queue)
    • Compliant Topic: an output port component (queue)
    • Data Contract Guardian: a workload component
note

For this tutorial, you should already be familiar with creating systems and their components, and understand the role of catalog info files and descriptors.

Data Contract Descriptor

To designate a system component (or subcomponent) as a Data Contract, set the spec.mesh.__dataContractEnabled property to true in its catalog info file. This flag signals to Witboost that the component should be treated and validated as a Data Contract.

Fields

The final descriptor of a data contract component should include the following fields:

  • __dataContractEnabled (boolean)
    • set to true
    • mapped from property spec.mesh.__dataContractEnabled in the catalog info file (in case of top-level components)
  • dataContract (object)
    • definition of the data contract
    • mapped from property spec.mesh.dataContract in the catalog info file (in case of top-level components)
    • the recommended — but not mandatory — schema for this object is the one described in this specification (DataContract field)

The dataContract specification will be used by:

  • the data contract guardian to understand necessary compliance checks
  • the Witboost Marketplace — once the system is published — to make the data contract specification accessible to potential consumers
note

The structure of the dataContract object is flexible, depending on the needs of guardian tech adapters and marketplace custom views. Your Platform Team should define a recommended schema for this object and enforce it through governance policies.

According to this, the sample data contract component descriptor will look like:

id: urn:dmb:cmp:domain:producer-system:0:data-contract
name: Data Contract
consumable: false
__dataContractEnabed: true
dataContract:
schema:
- name: message-field-1
dataType: string
- name: message-field-2
dataType: boolean
constraint: NOT_NULL
SLA:
upTime: 99.9%
# ...
components: # subcomponents
- id: urn:dmb:cmp:domain:producer-system:0:data-contract:landing-topic
name: Landing Topic
kind: storage
technology: Kafka
consumable: false
# ...

- id: urn:dmb:cmp:domain:producer-system:0:data-contract:non-compliant-topic
name: Non-Compliant Topic
kind: storage
technology: Kafka
consumable: false
# ...

- id: urn:dmb:cmp:domain:producer-system:0:data-contract:compliant-topic
name: Compliant Topic
kind: outputport
technology: Kafka
consumable: true
# ...

- id: urn:dmb:cmp:domain:producer-system:0:data-contract:guardian
name: Data Contract Guardian
kind: workload
consumable: false
# ...

The Compliant Topic is the only consumable component since it is the system's output port and should be displayed in the Witboost Marketplace, ready to be accessed by consumers.

note

You are free to define any data contract structure you prefer. The example provided is one possible approach among many.

Other alternatives include:

  • a consumable data contract component without subcomponents, where the guardian is another top-level component in the system
  • a consumable data contract subcomponent as part of a non-data-contract parent component, with a guardian as a sibling subcomponent or another component in the system

Distributed data contract definition

In cases where the data contract is a parent component with multiple consumable subcomponents, the subcomponent descriptors may optionally include a dataContract field to complement the parent’s data contract definition.

warning

If a component is a Data Contract, its subcomponents cannot also be marked as Data Contracts, but they can extend their parent’s definition.

In this scenario, the parent component defines the global data contract, setting expectations for all its consumable subcomponents. Subcomponents can then specify their specific assertions.

For example, the data contract descriptor in the parent component might require a change interval of two days, which will apply to each exposed port (the consumable subcomponents). The consumable subcomponents, which may expose different schemas, can include unique assertions on the schema in their individual dataContract field.

We call this pattern Distributed data contract definition: the data contract is unified under the parent component, but its definition is distributed across multiple descriptors.

Example:

name: Daily Transaction Summary Data Contract
consumable: false
__dataContractEnabed: true
dataContract:
SLA:
upTime: 99.9%
timeliness: 1 day
# ...
components: # subcomponents
- name: Financial Reporting Summary
kind: outputport
consumable: true
dataContract:
schema:
- name: date
dataType: date
description: 'Date of the transaction summary'
- name: total_transactions
dataType: int
description: 'Total number of transactions for the day'
- name: total_volume
dataType: float
description: 'Total transaction volume in the specified currency'
# ...
# ...
- name: Fraud Trend Analysis Summary
kind: outputport
consumable: true
dataContract:
schema:
- name: date
dataType: date
description: 'Date of the transaction summary'
- name: total_transactions
dataType: int
description: 'Total number of transactions for the day'
- name: high_risk_transaction_count
dataType: int
description: 'Count of transactions flagged as high risk'
# ...
# ...

Data Contract settings

The data contract definition also includes its settings.

By default, data contract settings are expected under property dataContract.settings. However your Platform Team may have configured different paths in the descriptor.

tip

When defining a data contract, check with your Platform Team to confirm the expected paths for data contract settings in the descriptor. Your organization has likely established a governance policy to validate these settings before the descriptor is deployed.

PropertyDefault pathTypeRequiredDescription
Ingestion ModedataContract.settings.ingestionModestring (enum)No (default: DATA_AT_REST)Specifies how data is ingested by the data contract.

Allowed values:

DATA_AT_REST — data is read from a static storage location, such as a data lake or database, where data remains stable until retrieved. This is typical for batch processing or periodically accessed datasets.

PUSH — data is actively pushed into the system, often in real time, through an event-driven mechanism such as a message queue or stream. This is common in systems requiring immediate data updates, like event or transaction processing.
On-broken-contract behaviourdataContract.settings.onBrokenContractstring (enum)No (default: RED_FLAG)Describes the action taken when a data contract is violated.

Allowed values:

RED_FLAG — an alert is issued, but non-compliant data may still be accessible

CIRCUIT_BREAK — non-compliant data is withheld, which may delay exposure of the latest data. This is the scenario of our leading example
DescriptiondataContract.settings.descriptionstringNoA brief description of the data contract

These settings primarily serve as documentation for data consumers. The Witboost Marketplace considers them when displaying the data contract.

Example:

id: urn:dmb:cmp:domain:producer-system:0:data-contract
name: Data Contract
consumable: false
__dataContractEnabed: true
dataContract:
schema:
- name: message-field-1
dataType: string
- name: message-field-2
dataType: boolean
constraint: NOT_NULL
SLA:
upTime: 99.9%
settings:
ingestionMode: PUSH # messages are pushed in the landing topic
onBrokenContract: CIRCUIT_BREAK # non-compliant messages are never exposed and are instead forwarded to the Non-compliant topic
description: A sample circuit-break streaming data contract

Guardian descriptor

A component is considered a data contract guardian if it defines the __dataContractGuardianSpec property in its descriptor, mapped from spec.mesh.__dataContractGuardianSpec (in case of top-level components) in the catalog info file.

Fields

The only required field inside __dataContractGuardianSpec is guards, an array of objects with these attributes:

PropertyTypeRequiredDescription
dataContractIdstringYesURN of a data contract component or subcomponent, in the same system, monitored by this guardian
monitoringResultSchedulingobjectNoInformation about the expected monitoring result frequency (see section below)

In the leading example:

id: urn:dmb:cmp:domain:producer-system:0:data-contract
name: Data Contract
consumable: false
__dataContractEnabed: true
dataContract:
schema:
- name: message-field-1
dataType: string
- name: message-field-2
dataType: boolean
constraint: NOT_NULL
SLA:
upTime: 99.9%
# ...
components: # subcomponents
# [topic subcomponents omitted for better clarity]

- id: urn:dmb:cmp:domain:producer-system:0:data-contract:guardian
name: Data Contract Guardian
kind: workload
consumable: false
__dataContractGuardianSpec:
guards:
- dataContractId: urn:dmb:cmp:domain:producer-system:0:data-contract
# ...
info

A data contract guardian can guard multiple data contracts (i.e., components or subcomponents with __dataContractEnabled: true) defined within its same system descriptor.

Monitoring result scheduling

A guardian is responsible for periodically verifying the compliance of the data contract and reporting the monitoring results to the Computational Governance Platform.

The monitoringResultScheduling property, when provided, contains the following details:

PropertyTypeRequiredDescriptionExample
frequencystringYesThe expected result frequency as a cron expression in Quartz-like syntax with 6 fields that go from seconds to day of week in the following order: Seconds (0-59), Minutes (0-59), Hour of Day (0-23), Day of Month (1-31), Month (1-12), Day Of Week (0-6 where 0 is Monday).

The following special characters are not allowed: L, W, LW, #. Time zone: UTC
0 0 0,12 ? * * (every day at midnight and midday)
toleranceWindowLengthobjectNo (default: PT1H)An ISO-8601 duration string (PnDTnHnMnS format) that specifies the length of the tolerance window after each scheduled frequency time.
The tolerance window starts at the exact time of the cron expression, and monitoring results are accepted as "on time" if they are received
within this period. If no result is received within this window, a delay is reported
PT2H30M (two and a half hour duration)
tip
# ...
__dataContractGuardianSpec:
guards:
- dataContractId: urn:dmb:cmp:domain:producer-system:0:data-contract
monitoringResultScheduling:
frequency: 0 0 0,12 ? * *
toleranceWindowLength: PT2H30M
# ...

In this example, the guardian is expected to provide monitoring results during the following time windows:

  • from midnight (12:00 AM) until 2:30 AM.
  • from midday (12:00 PM) until 2:30 PM.

Scenario:

  • suppose it’s 3:00 PM, and the result for the midday to 2:30 PM window has not been received yet
  • in this case, the guardian has missed the 2:30 PM deadline for providing the monitoring result within the tolerance window (which is PT2H30M or 2.5 hours from the scheduled time: 12:00 AM)
  • a warning will appear both on the data contract page and the graph in the Witboost Marketplace. It will remain visible until the monitoring result is finally received
note

Checks on the monitoring result scheduling are run only once the guardian with the scheduling spec is deployed.

Infrastructure template

The infrastructureTemplateId field in the descriptor of a component identifies a microservice — a tech adapter — responsible for managing the component's provisioning.

The infrastructureTemplateId for a guardian component must be associated with a dedicated computational policy to collect validation results from the guardian.

warning

Ensure with your Platform Team that the infrastructureTemplateId assigned to the guardian has been registered as a valid guardian infrastructure template ID.

Constraints and validations

When a system includes data contracts and guardians, Witboost checks for the following conditions. If any are violated, you will not be able to create a release from the system descriptor, and an error message will list the issues.

AssertionNotes
A data contract component (__dataContractEnabled: true) cannot contain a data contract subcomponent (__dataContractEnabled: true)Subcomponents may still declare a dataContract property, as in the Distributed Data Contract scenario described above
A data contract component must be consumable (consumable: true) or contain at least one consumable subcomponentA consumable component cannot be parent of a consumable subcomponent
A data contract guardian can only guard data contracts defined within its parent system descriptor (__dataContractGuardianSpec.guards[i].dataContractId)This ensures the data contract and its guardian are part of the same system and get deployed together

Additional governance policies, defined by your Platform or Governance Team, may be applied to verify data contract structure and completeness based on organizational requirements. These will be checked when testing or deploying a system descriptor.