Configuring a Dataset

A dataset is required to apply data quality tests to.

Each dataset typically is matched to a table in K. This allows K to track the DQ score results per dataset.

Future enhancements to KDQ will allow for dataset scores to be assigned to different tables in K

Prerequisites

Connection configured
Atleast one Job configured

Creating a dataset

Select a workspace
On the dataset tab click on Create Dataset
Configure the Dataset details - fill in the following detailed
1. Connection - select a workspace connection to use
2. Dataset scope - select an option for the scope
  1. Select all records: Tests all records in the dataset
  2. Custom query: Customise the dataset query. Use this option to test a sample set of data (e.g. Select * from [schema].[table] limit 1000), only new data (e.g. Select * from [schema].[table] where last_updated_at >= DATEADD(hour, -24, CURRENT_TIMESTAMP()) or any other custom query that you want to use
3. Link asset in K - The target in K to associate the DQ results to
4. Dataset name - Name of this dataset
5. Dataset description - Description for this dataset
6. Job - The job to associate the Dataset and its DQ tests to
7. Click Next
After the dataset is validated, set the Primary Key (a column in the dataset).
1. Click on the Primary key drop down. Select a column to use
  
  The primary key is used to the KDQ results to advise on which records have failed the test.
  
  Where the primary key is a composite key that is calculated at run time, you can add it as part of the query in the Dataset scope.
  
  Note: If the key is not included in the table in K, the primary key cannot be used as a test target
2. Click Save

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.