Creating Data Quality Rule

The Author users can create data quality rules by associating multiple data objects.

Note: The other methods for creating data quality rules are Rule Recommendations, OvalEdge APIs, and Load Metadata from Files.

Add Rule

On the Data Quality Rule landing page, click the "+" icon to initiate the creation of a data quality rule.

Selecting Object Type

When creating a new data quality rule, users must first select an Object Type. This essentially defines the kind of data the rule will assess. Here's an overview of the supported Object Types to help users choose the right one:

  • Tables: If the rule targets the overall quality of an entire table within the data set, select "Tables."

  • Table Columns: For rules focusing on specific data points within a table (e.g., checking for missing values or invalid entries in a particular column), choose "Table Columns."

  • Files: This option is suitable for rules that evaluate the quality of entire files within the data storage.

  • File Columns: Similar to table columns, "File Columns" allows users to create rules that assess specific data points within a file (e.g., verifying data format or ensuring consistency across multiple files).

  • Codes: This Object Type is used for rules that evaluate the quality of custom code snippets used in data management processes.

Selecting the appropriate object type is crucial for users in creating effective data quality rules. It ensures the rule is applied to the intended data and provides targeted checks for the specific data points users want to monitor.

Selecting Function

Once users select the object type for their data quality rule, OvalEdge assists them in choosing the proper function.

Here's how it works:

  • Function Selection: Based on the chosen object type (tables, columns, files, etc.), OvalEdge presents a relevant list of pre-built data quality functions.

  • Function Guidance: Each function has a handy tooltip that appears on hover, providing users with a brief explanation of its purpose. This quick info helps users understand what each function does.

  • Deeper Dives: For a more comprehensive understanding, users can click the "See More" option within the tooltip. This directs them to the OvalEdge Help Center, where detailed explanations and use cases for each function are available.

By offering pre-built functions, tooltips, and comprehensive Help Center resources, OvalEdge allows users to select the most appropriate function for their specific data quality needs. This ensures their rules effectively evaluate the chosen data points.

Configuring Function Inputs

Each Data Quality Function has a set of inputs that are used to configure the criteria of the selected Data Quality Function. The input comprises of Input and Success criteria:

  • Input: This field allows users to input criteria for rule execution by providing number, text, date and comma separated values. The Input Operators enhance flexibility and help in obtaining results based on the specified criteria. Note: It changes according to function. Please refer to the OvalEdge_Data Quality Functions to determine which functions Input is applicable.

  • Success Criteria: Users can define Success Criteria, setting an acceptable range or value. If the result of the data quality rule falls within this specified range, the execution is deemed successful; otherwise, it is considered failed. The Success Criteria determination involves calculating count and percentage, serving as the Validation Mode for Data Quality Rule results. Both Input and Success Criteria fields offer various operators to facilitate configuration.

    • Operator Labels: Equal To, Matching, Not Equal To, Less Than, Less Than or Equal To, Greater Than, Greater Than or Equal To, In Between, Not In Between, Starting With, Ending With, Is Empty, Is Null, Is Not Empty, Is Not Null, and Containing.

    • Success Validation Mode: Count and Percent.

Associating Objects

  • Depending on the chosen object type (tables, table columns, files, file columns, codes), users can add the data quality rule to the desired objects where they want the rule to be applied. The Add Object widget enables users to select the data objects and associate the rule with them.

Configuring Rule Name

  • Each data quality rule requires a unique name. The Rule Name field enables users to define the name of the data quality rule.

Configuring Rule Purpose

  • The Purpose field is mandatory, serving to explain the intent and rationale behind creating a specific Data Quality Rule. Users can include essential information related to the rule in this field.

Selecting Dimension

Data quality dimensions are specific aspects used to evaluate and quantify the quality of data within a dataset. Each Data Quality Function is linked with a default dimension, allowing users to categorize created rules based on that dimension.

OvalEdge has eight system-defined dimensions, providing a framework for categorizing data quality rules effectively.

  • Accuracy: This dimension measures the degree of correctness and precision of the data. It focuses on eliminating errors, inconsistencies, or inaccuracies that might be present in the data.

  • Completeness: The completeness dimension evaluates the extent to which all required data is present and available. It ensures that there are no missing values or gaps in the dataset, enabling a comprehensive understanding of the data.

  • Consistency: Consistency examines the uniformity and coherence of data across different sources or time periods. It ensures that data is consistent in terms of format, structure, and meaning, regardless of its origin.

  • Timeliness: Timeliness refers to the relevance of data. It ensures that data is up-to-date and available when needed, enabling informed decision-making and analysis based on the most current information.

  • Validity: The validity dimension assesses the extent to which data conforms to predefined rules or standards. It ensures that data meets the specified criteria and is fit for its intended purpose.

  • Uniqueness: Uniqueness focuses on the absence of duplicate or redundant data records. It ensures that each data record is unique and that there are no unnecessary repetitions, which can affect data analysis and processing.

  • Integrity: The integrity dimension represents the overall accuracy, consistency, and reliability of data throughout its lifecycle. It encompasses multiple aspects of data quality, ensuring that data remains accurate and reliable from creation to disposal.

  • Conformity: Conformity measures the degree to which data conforms to standardized data definitions. It ensures that data adheres to predefined formats, structures, and naming conventions, facilitating consistency and interoperability across different systems and processes.

User Defined Dimensions

Users also have the option to define custom dimensions, which can be created within the Data Quality -> System Settings.

Scheduling Execution

Users assigned the role admin or steward can schedule Data Quality Rules using the Schedule toggle button. This scheduling feature automates rule execution at set intervals. Users can adjust the schedule post-creation, ensuring flexibility to adapt to evolving requirements. Modifications to the schedule are possible only when the Data Quality Rule is in the Active state, facilitating efficient adjustments to execution timing and frequency in line with evolving data quality management needs.

Configuring Notifications

This section provides radio buttons to activate Alerts and Service Requests in the event of a rule failure. Enabling these options notifies users of Data Quality Rule failures and allows OvalEdge to initiate service requests. The configurable options in this section include:

  • Send Alert on Failure: Once activated, this toggle button triggers an inbox alert to the Data Quality Rule Steward. This feature ensures prompt notification whenever a data quality rule fails.

    • The notifications related to the rule failure or successes will be shown in My Profile > Inbox > Data Quality

  • The Service Requests created on the failure for the rule are displayed in Data Quality > Data Quality Rule Summary > Service Requests.

  • Create Service Request On Failure: Enabling the "Create Service Request on Failure" toggle ensures automatic generation of service requests in response to data quality rule failures on associated objects. Service requests play a crucial role in calculating the Data Quality Score for the associated object. These requests can be accessed from the Service Request tab within the Data Quality Rule. Conversely, turning off this toggle prevents service request generation, leaving the Data Quality Score unaffected.

    • Do Not Report Duplicate Service Requests: This button is visible only when the "Create Service Request on Failure" option is enabled. Its purpose is to prevent duplicate service requests by ensuring that a new request is created only if there is no existing request in an OPEN status. The button remains hidden when the "Create Service Request on Failure" option is disabled. A Data Quality Service Request for an object will be created only if there are no existing OPEN requests with a status other than Closed, Rejected, Expired, Deleted, or Resolved.

    • Closed- When a service Request is closed after its resolution.

    • Rejected - When a Service Request is rejected by the approver.

    • Expired - When a Service Request has surpassed a time limit without being completed or resolved.

Caution Downstream Objects On Failure

Enabling this toggle triggers a caution message on downstream objects, highlighting potential data quality issues resulting from a failed rule.

Updating Steward

Each data quality rule is assigned a steward who oversees the configuration tasks, ensuring compliance with data quality standards for the corresponding objects. Stewards are responsible for setting rule parameters, such as input and success criteria, leveraging their understanding of data quality principles. They also monitor rule execution and resolve any data quality issues. Initially, the creator or owner of the rule is assigned as the default steward, but this can be updated to include other users with role admin assigned.

Use Case

In a customer relationship management (CRM) system, ensuring accurate and valid email addresses is crucial for effective communication and data integrity. Implementing a data quality rule around an email format check function helps maintain high-quality data within the CRM system.

  • Solution: In this scenario, users can create a rule with the following settings:

  • Object Type: Table Column

  • Function: Valid Email Format

  • Success Criteria: Result Value Percent should be Equal To 100

  • Associate Objects: Associate the table columns that should go through this validation

  • Rule Name: Email Validation

  • Rule Purpose: This rule validates email addresses for correct formatting, ensuring data consistency and accuracy

  • Dimension: Validity

By configuring the rule this way, it ensures that every value in the specified column matches the email format with 100% validity.

Accessing Data Quality Rules

The Data Quality Rules landing page displays a list of all current rules. Users can click on any rule name to access the specific rule, which redirects them to the summary tab for that particular data quality rule.

Dashboard Tab

The Rule Execution Dashboard provides a quick view of how a data quality rule performed in its most recent run and over the last few runs. It displays both object-level and row-level results, along with key metrics such as execution time, steward, and service requests.

Top KPIs (Cards)

  • Objects Evaluated: Number of objects (such as tables, columns, or files) that were checked in the last execution.

  • Objects Passed: Number of objects that passed the rule in the last execution. Clicking on this card opens the Object Execution Results page filtered to show only passed objects.

  • Objects Failed: Number of objects that failed the rule in the last execution. Clicking on this card opens the Object Execution Results page filtered to show only failed objects.

  • Rows Evaluated: Total number of rows that were checked in the last execution.

Last Execution Summary

  • Execution Result: Shows whether the last execution passed, failed, or had another outcome.

  • Execution ID: A unique ID for the last execution. Clicking on this ID opens the Rule Executions page filtered by this execution.

  • Rows Sent for Remediation: Number of rows from this execution that were sent to the Remediation Center. Clicking on this value opens the Remediation Center filtered to show these rows.

  • Execution Time: The date and time when the last execution ran.

  • Rule Steward: The steward or owner responsible for this rule.

  • Service Requests Raised: Number of service requests that were created during the last execution. Clicking on this value opens the Service Requests page filtered for this rule.

Charts

  • Object Results: A breakdown of the last execution results at the object level. Results are grouped into: Passed, Failed, Undetermined, and Execution Failed.

  • Object Quality Trend (Last 10 Executions): Shows how object results changed over the last 10 runs. (X-axis: Execution IDs) (Y-axis: Percentage of passed objects (calculated as passed objects ÷ total objects × 100))

  • Row Results: A breakdown of the last execution results at the row level, showing passed and failed counts.

  • Row Quality Trend (Last 10 Executions): Shows how row results changed over the last 10 runs. (X-axis: Execution IDs) (Y-axis: Percentage of passed rows (calculated as passed rows ÷ total rows × 100))

Summary Tab

This tab consolidates crucial information regarding the configuration of a particular data quality rule.

  • Execution Information: This section provides execution details of the data quality rule. It includes the "Last Status," indicating the last execution status, and the "Last Run Time," displaying the timestamp of the most recent execution.

  • Purpose: The purpose field is mandatory and articulates the objective of the data quality rule. It communicates to users why the rule was created, its validation criteria, and the expected output.

  • Function: Within the summary tab, this section presents details about the chosen data quality function applied to the rule. It displays the function name, inputs, and the selected dimension for the data quality rule.

  • Data Quality Remediation Center: The Data Quality Remediation Center section is visible on the Data Quality Summary tab only if the selected object type is supported for the Data Quality Remediation Center (DQRC). To modify the fields in this section, the Data Quality Rule must be in the Draft state, and the "Add Failed Rows to Remediation Center" toggle button must be activated. Enabling this toggle button allows for the reporting of failed values to the remediation center; otherwise, the failed values specific to the associated objects within the rule will not be sent to the Data Quality Remediation Center (DQRC). The following fields are configured under this section which are mapped to the Data Quality Remediation Center:

    • Monetary Value - This field is configured at the Data Quality Rule level and helps users understand the cost of each failed value reported as an independent entry in the Remediation Center.

    • Criticality - The criticality field assists users in understanding the severity of a failed value and its importance. The severity range is categorized into three levels: high, medium, and low.

Note: With the Monetary Value and Criticality Fields, users can grasp the impact of a particular failed value.

  • Violation Message - The purpose of a violation message is to explain why a specific data object failed to meet the specified criteria. It serves important roles for both technical and business users by helping identify the root cause of the issue. The violation message bridges the gap between business and technical users, facilitating better communication and enabling effective data quality management in various scenarios.

  • Corrective Action - This field aims to offer actionable guidance and assistance to users for resolving data quality issues related to reported failed values. It provides practical steps, recommendations, or references to help business users address the specific problems flagged by the data quality rules.

  • Avoid Duplicate Failures: Failures allow users to determine whether to report duplicate failures to the DQRC. Selecting "Yes'' skips reporting duplicate entries, simplifying the remediation process for unique issues. Selecting "No" (default) maintains the current behavior of reporting all failures, including duplicates, for detailed review.

  • Tags: This section lists all the tags associated with the Data Quality Rule. Users can edit these tags when the rule is in the Draft state.

  • Additional Fields: This section presents the Custom Fields added to the Data Quality Rule. Users can edit these fields while the rule is in the Draft state. The History option shows all previous changes related to the custom fields section.

  • Top Users: This section showcases users who have viewed, commented, and collaborated on data quality rules. By default, the Data Quality Steward appears at the top, while other users who have engaged with the rule are listed based on their activity count.

  • Alerts and Service Requests: This section provides radio buttons to enable Alerts and Service Requests in the event of a Data Quality Rule failure. By activating these options, users will receive Inbox notifications for data quality rule failures, and the application can initiate service requests for appropriate actions. The configurable options within this section include:

    • Send Alert on Failure allows users to create a customized message that notifies the steward about the failure of a rule. They can activate the toggle button to enable the alert feature. Once a rule fails, the assigned user in the approval workflow will receive a notification about the failure of the data quality rule.

      • Include Detailed Report in Notifications

        Select this option to control the level of detail included in Data Quality Rule (DQR) Failure Notifications.

        • When the toggle is OFF (default): Notifications display only a concise summary table with execution metrics for quick monitoring.

        • When the toggle is ON: Notifications include both the summary table and a detailed report of failed values with row-level results, sent as an attachment in email notifications, for in-depth analysis.

    • Create Service Request on Failure option automatically generates a service request whenever a DQR failure occurs in the application.

    • Do Not Report Duplicate Service Request option prevents the creation of duplicate data quality service requests.

      • It's accessible when users enable the "Create Service Request on Failure" feature. The purpose of this option is to generate a new service request only if the existing one is in an Open status, thus avoiding duplicate requests.

      • When the "Create Service Request on Failure" feature is disabled, this option will not be visible. Creating a Data Quality Service Request for an object depends on the absence of any Open Data Quality Service Requests. An Open status includes anything other than Closed, Rejected, Expired, Deleted, or Resolved.

  • Caution Downstream Objects on Failure

Users can configure these options during the creation of a Data Quality Rule and have the flexibility to edit these details in the summary tab of the rule, provided that the rule is in the Draft state.

Associated Objects

This tab offers detailed information about all associated objects for a Data Quality Rule, enabling users to customize settings for each object and execute rules with unique criteria specified for each one.

Table

The table displays the contextual information, object-level configuration, and the last execution details for the respective objects.

  • Contextual Information: In the Associated Objects tab, contextual information for Tables includes details such as Database, Schema, and Table. These columns offer essential context for understanding the selected table. Users can also utilize filter functionality for each of these columns, allowing them to refine and narrow down their results based on specific criteria or preferences.

  • Execution Details:

    • Last Run On: This field displays the timestamp indicating when the Data Quality Rule was last executed. Users can sort the timestamps to view the most recent or oldest executions.

    • Last Object Execution Result: Users can observe the last execution status of the data quality rule and filter rules based on this status. Additionally, a quick view option provides a summary of the last execution result, including Message, Result Value, Input, Success Criteria, Service Request ID, Failed Values sent to Remediation Center, and Downstream Objects Cautioned.

    • History: Selecting this option directs users to the Object Execution Results tab, where detailed execution information specific to a particular data object is presented.

    • Advanced Settings: This allows users to configure additional settings for each object associated with the rule. (Please refer to the Advanced Settings section.)

    • Add Objects: This tab also offers users the ability to add multiple objects belonging to the supported Data Quality Function.

Table Columns

The table column displays the contextual information, object-level configuration, and the last execution details for the respective objects.

  • Contextual Information: In the Associated Objects tab for Table Column, contextual information includes details such as Database, Schema, Table, and Column. These columns provide essential context for understanding the selected table columns. Users can also utilize filter functionality for each of these columns, enabling them to refine and narrow down their results based on specific criteria or preferences.

  • Max Failed Values Limit: This column displays the object-level configuration option for setting the maximum number of failed values reported to the Data Quality Remediation Center.

  • Execution Details:

    • Last Run On: This field displays the timestamp indicating when the Data Quality Rule was last executed. Users can sort the timestamps to view the most recent or oldest executions.

    • Last Object Execution Result: Users can observe the last execution status of the data quality rule and filter rules based on this status. Additionally, a quick view option provides a summary of the last execution result, including Message, Result Value, Input, Success Criteria, Service Request ID, Failed Values sent to Remediation Center, and Downstream Objects Cautioned.

    • History: Selecting this option directs users to the Object Execution Results tab, where detailed execution information specific to a particular data object is presented.

    • Primary Columns: Including primary columns assists in identifying failed values effectively. This approach showcases the failed value alongside the selected primary columns, aiding in pinpointing the specific records causing the issue.

    • Additional Columns: Incorporating additional columns enhances the identification of failed values. This allows for displaying the failed value alongside the selected additional columns, facilitating clearer identification of problematic data.

    • Advanced Settings: This allows users to configure additional settings for each object associated with the rule. (Please refer to the Advanced Settings section.)

    • Add Objects: This tab also offers users the ability to add multiple objects belonging to the supported Data Quality Function.

Files

The files display the contextual information, object-level configuration, and the last execution details for the respective objects.

The Associated Objects tab for files presents the following information.

  • Contextual Information: Contextual information for the Associated Objects tab for File includes details such as File System, File Name, File Location. These columns offer essential context for understanding the selected table columns. Additionally, users have the option to utilize filter functionality for each of these columns, enabling them to refine and narrow down their results based on specific criteria or preferences. Execution Details:

  • Last Run On: It helps users understand the timestamp when the Data Quality Rule was last run on. Users can sort by new and old dates based on the timestamp.

  • Last Object Execution Result: Users can view the last execution status of respective DQRs. This column offers users the ability to search the DQRs based on their last execution status as well.

  • History: Selecting this option directs users to the Object Execution Results tab, where comprehensive execution information specific to a particular data object is presented.

File Columns

The file columns display contextual information, object-level configuration, and the last execution details for the respective objects.

  • Contextual Information: Contextual information for the Associated Objects tab for File Column includes details such as File System, File Name, File Column Name, File Location. These columns offer essential context for understanding the selected table columns. Additionally, users have the option to utilize filter functionality for each of these columns, enabling them to refine and narrow down their results based on specific criteria or preferences.

  • Max Failed Values Limit: This column serves as an object-level configuration option for setting the maximum number of failed values reported to the Data Quality Remediation Center (DQRC). It is exclusively visible for Table Columns and File Columns object types. Presently, support for reporting failed values to the Data Quality Remediation Center is available only for Table Columns.

  • Execution Details:

    • Last Run On: It helps users understand the timestamp when the Data Quality Rule was last run on. Users can sort by new and old dates based on the timestamp.

    • Last Object Execution Result: Users can view the last execution status of respective DQRs. This column offers users the ability to search the DQRs based on their last execution status as well. This column allows users to configure additional settings for each object associated with the rule.

  • Root Path - Displays the location of the file column.

    • Where Column 1 - This allows users to select the column.

    • Where Predicate 1 - This field helps in selecting the operator.

    • Where Value 1 - This field is for entering the value

  • History: Selecting this option directs users to the Object Execution Results tab, where comprehensive execution information specific to a particular data object is presented.

Code

The code displays contextual information, object-level configuration, and the last execution details for the respective objects.

  • Contextual Information: Contextual information for the Associated Objects tab includes details such as Database, Rule Query, Results Stats Query and Failed Data Query. Additionally, users have the option to utilize search functionality for each of these columns, enabling them to refine and narrow down their results based on specific criteria or preferences.

  • Execution Details:

    • Last Run On: It helps users understand the timestamp when the Data Quality Rule was last run on. Users can sort by new and old dates based on the timestamp.

    • Last Result: Users can view the last execution status of respective DQRs. This column offers users the ability to search the DQRs based on their last execution status as well.

    • History: Selecting this option directs users to the Object Execution Results tab, where comprehensive execution information specific to a particular data object is presented.

Advanced Settings

This column is displayed for table, table column, file, and file column functions and allows users to configure additional settings for each object associated with the rule.

  • Filter: Provides essential features for Object level configuration.

  • Parameterised Query: The text box within the filter tab enables users to adjust the object level Data Quality Function queries by adding a where clause. This acts as a parameterised query, facilitating the filtering of data quality rule execution based on specific conditions.

  • Editing Object level Data Quality Function Queries: The editing icon, located next to the object level queries, permits users to modify the DQF query specifically for that object. This feature proves useful when users anticipate different configurations or execution results for a particular data object.

  • Function Inputs: This tab lists the Input and Success Criteria fields that are to be configured for the particular object.

    • Input

    • Success Criteria

The Violation Message displayed below would display the object-level violation message, which, once configured, would be reflected under the DQRC.

Last updated

Was this helpful?