Testing Data Products with Computational Policies
Testing a data product is essential to verify its quality and readiness for production. This process involves evaluating output ports, workloads, and compliance with data policies to ensure reliable and meaningful results. By running these tests, you can identify and resolve issues early, saving time and improving product trustworthiness and overall data governance.
1. Introduce Data Product Testing
Okay, We are going to check how to test a Data Product
2. Describe Product Components
So this is a product composed by two output ports and 1 workload, just an example.
3. Access Product Interface
Each component contributes to create the overall Data Product descriptor
4. Explain Descriptor Creation
Witboost creates automatically the full descriptor based on the target environment.
5. Test
This side bar opens the test pan
6. Run a test
Once here we can run a test to understand if our Data Product is ready to be promoted in production
7. Initiate Test Run
Let's run it
8. Test Results
We now get all the results from policies that have been enabled for Data Product
9. Report Test Result
We have three possible feedbacks from each policy: failure, success or warning
10. Identify Missing Constraints
In this case we are understanding now that we didn't implement Data Quality and also we are not fully compliant with architectural requirements like exposing output ports with GraphQL
11. Metadata Issues
We also have some issue about metadata compliance and Dora classification, let's check for more details.
12. Review Policy Problems
to better understand the issue, we can check details and in this specific case is clear that we are missing one field.
13. Close the details
14. Description Deficiency
At the same way, we are missing meaningful descriptions in our data contract.
15. Schema Issues
Description are really important to crate the proper context and business documentation around data products
16. Open Descriptor Details
17. Discuss Data Duplication Issue
Also from this other policy we can understand that we are about duplicate too much information that is already present in the Data ProducMarketplace
18. Identify Duplication in Product
This policy compare the schemas of the Data Product we are testing with all the schemas that have been already published in the Marketplace
19. Relate Duplication to Churn
And we can also explore which of the existing Data Products are more similar to the one we are developing. It is important to detect this kind of issue before we complete all the developments.
Testing these areas and creating a quality gate helps maintaining high standards and reliable Data Products even if developed in a highly decentralized way.