Data Quality

New & Improved

Data Quality Functions

The Data Quality Functions module enables centralized management of system-defined (read-only) and custom functions for data objects such as tables, columns, and files. Key updates include:

Custom Function Management: Create and edit functions with details such as object type, name, description, and dimension (e.g., Uniqueness, Completeness).
Function Details Panel: View function name, type, status, and description.
Criteria Configuration: Define logic and success criteria for data validation.
Function Queries: View connector-specific queries and supported data types.
Help Documentation: Access function help with formulas, examples, and pass/fail scenarios.
History Tracking: Track changes with username, date, and time for auditability.

Connector and Function Support

Summary A new Summary View displays the number of supported Data Quality functions for each connector, including the connector name, connector category, and counts for object-level functions (system/custom), attribute-level functions (system/custom), and the total functions count, which is the combined number of object-level and column-level functions. Clicking any function count opens the Data Quality Functions tab with filters applied, showing details such as object type, function name, and supported data or file types.
Data Quality Functions A new Data Quality Functions tab provides a detailed view of all data quality functions across connectors, including the object type, connector, function name, creation type (system or custom), and applicable data types.

Object Status Management in DQRs

A new status management feature in Data Quality Rules enables tracking and control of associated objects (tables, columns, files, codes) with Active or Inactive status.

Objects are Active by default when a rule is created and become Inactive if a connector license is removed or if their data type changes.
Stewards and Role Admins can update status from the 9-dot menu on the Object Detail page, while other users have view-only access.
Rules run only on Active objects, with Inactive ones skipped automatically.
The configuration page displays and filters objects by status, and during rule edits or additions, the system validates license and data type compatibility, blocking actions and showing errors if checks fail.

Incremental Rule Execution for Data Quality

Data Quality Rules can now run incrementally on large datasets. Instead of scanning the entire table, rules evaluate only newly added or updated records by specifying an incremental tracking column, such as last_updated or created_at. This column identifies changes made since the last successful rule execution. The feature can be enabled at the rule level for both system-defined and custom rules.

Improved Data Quality Score Calculation

In the Data Catalog, the Data Quality Score now uses row-level calculations to provide a more accurate assessment of data quality. Previously, the score was determined by the number of rules that passed or failed, which could misrepresent quality, especially when multiple rules applied to a single table. If all rules failed, the Quality Index displayed no score, making it difficult to evaluate the table’s condition.

With the updated approach, the score reflects the percentage of rows that pass or fail data quality checks. This change ensures visibility into data quality, even when rules fail.

Data Quality Remediation Fields Configuration

In Data Quality Rules, metadata fields—Monetary Value (business impact), Criticality (severity level), Violation Message (issue description), and Corrective Action (resolution steps)—can now be configured independently of the remediation setting. Previously, these fields were available only when failed rows or values were sent to the Remediation Center, limiting the ability to fully define a rule.

With this update, Stewards and Admins can capture the intent and business impact of a rule without enabling remediation. When the remediation toggle is off, the metadata fields remain visible and editable in draft status. All other rule validation checks remain unchanged.

Remediation Support for UDF-Based Rules

In the Data Quality Remediation Center, remediation SQL generation now supports User Defined Functions (UDFs). Previously, the system generated remediation SQL only for rules using Out-of-the-Box (OOTB) functions, which limited remediation options for User Defined Functions (UDFs) rules.

With this update, remediation SQL for UDFs is generated from the Failed Values Query defined in the rule and is visible within the Remediation Center. The query can be reviewed and executed directly to address data quality issues.

Data Quality Support Extended to BOX Connector (BETA)

The Data Quality module now supports the BOX connector, allowing users to apply data quality rules to files stored in the BOX platform. With this enhancement, users can assess and monitor the quality of structured data within supported file formats (such as CSV, XLSX) directly from the BOX.

File Function Standardisation in DQ (BETA)

In Data Quality Rules, all functions related to files and file columns have been standardised to ensure consistent behavior. Redundant functions have been removed, and the execution logic has been simplified. The associated documentation, including function help, has been updated to reflect these changes accurately.

Customization via System Settings

Name

Description

dataquality.score.external.max.score

Configure the base score for the Data Quality score calculation.

anomaly.detection.default.assignee

Configure the default governance role for anomaly detection if a user wants to assign someone other than the default custodian.

Parameters:

Select the role from the specified drop-down list.

dataquality.associatedobjects.files.limit

Configure to set the maximum number of file objects (files and file columns) that can be associated with a Data Quality Rule.

Parameters:

The default value is 20.
The minimum allowed value is one, and the maximum is 250.
Enter the value in the provided field.

dataquality.associatedobjects.tables.limit

Configure to set the maximum number of table objects (tables and table columns) that can be associated with a Data Quality Rule.

Parameters:

The default value is 1000.
The minimum allowed value is one, and the maximum is 1000.
Enter the value in the provided field.

dataquality.rulescore.calculation.method

It allows users to choose how the Data Quality Score is calculated either Object-Based or Row-Based.

Parameters:

The default value is Object-Based
If Object-Based: Score is calculated based on how many rules each object passes.
If Row-Based: Score is calculated based on how many rows meet the rule conditions.

Dq.execution.connection.validation

Configure to enable connection health check during Data Quality rule execution.

Parameters:

The default value is false.
Set to true to validate the connection during rule execution.

Dataquality.incremental.supported.datatypes

Define the list of data types used to identify incremental data in tables during data quality rule execution.

Functionality:

Only records with newly added or modified values based on these data types are considered during rule execution.

Supports diverse source systems by allowing customization of acceptable date/time data types for incremental filtering.

Parameters:

The default value is timestamp, date, datetime, smalldatetime.
To support additional or non-standard data types used in your source systems, append them as comma-separated values.

Advanced Jobs

Name

Description

Sync DQ Policy Execution Summary to OE

This job synchronizes the policy execution summary results to OvalEdge (OE).

Load Dataset For Data Quality

This job involves loading the dataset from the XLSX file to associate data quality rules.

PreviousGovernance Catalog NextData Stories

Last updated 3 months ago

Was this helpful?