Data Quality Score

The Data Quality Score enables users to evaluate the data quality of individual data objects. It considers various factors during calculation, which can be triggered during profiling or running data quality rules.

Data Quality Rule Score: This score reflects the results of data quality rules executed on the specific object. The better the data adheres to the rules, the higher the score.
Profile Score: This score is based on the null density (percentage of missing values) identified during the profiling process. A lower null density translates to a higher profile score.
Child Score: For objects with hierarchical relationships (e.g., a table column being a child of a table), the data quality scores of these child objects are factored in.
Service Request Score: This score acts as a reduction factor. It considers data quality service requests raised and resolved on the object. A higher number of unresolved issues lowers the score.

The Data Quality Score provides a weighted average of these factors, giving users a clear picture of their data's health.

Data Quality Rule Score

Data Quality Rules in OvalEdge define the criteria that data must meet, and the applied data quality functions determine whether the data meets these criteria or not. These rules contribute significantly to the overall Data Quality Score.

OvalEdge supports two scoring methods:

Rule-based scoring: The score is calculated based on the number of rules that pass or fail.
Row-based scoring: The score is calculated by evaluating the number of rows that pass or fail data quality checks, providing a more accurate representation of data quality.

This scoring ensures better visibility into data health. Even when multiple rules fail on a table, the row-based approach provides a clear measure of data quality instead of displaying no score.

📊 Example: Column = Customer.Email

1. Row-based Calculation (by data rows)

Formula:

((Passed Rows ÷ Total Rows) × 100) × (Rule Weightage ÷ 100)

Total Rows = 1,000
Passed Rows = 950
Rule Weightage = 25 (default)

((950 ÷ 1,000) × 100) × (25 ÷ 100)

= (95 × 0.25)

= 23.75

✅ Row-based Score = 23.75 (out of 100)

2. Rule-based Calculation (by rules)

Formula:

((Passed Rules ÷ Total Rules) × 100) × (Rule Weightage ÷ 100)

Total Rules on Customer.Email = 3
Not Null → ✅ Passed
Format Check → ❌ Failed
Uniqueness → ✅ Passed
Passed Rules = 2
Rule Weightage = 25 (default)

((2 ÷ 3) × 100) × (25 ÷ 100)

= (66.7 × 0.25)

= 16.7

✅ Rule-based Score = 16.7 (out of 100)

🔎 Final Result

Row-based Score = 23.75
Rule-based Score = 16.7
Final overall DQ Score stays between 0–100 because each rule’s contribution is limited by its weightage.

Profile Score

The Profile Score provides a quick assessment of data quality during profiling, even without data quality rules in place. This score focuses on the profiled object's null density (percentage of missing values). Here's how it works:

Calculation: Null density plays a key role in the calculation. A higher null density indicates a larger proportion of missing values, resulting in a lower score. Conversely, a lower null density indicates good data completeness, resulting in a higher score.
Weighting: By default, the profile score contributes 25% to the overall data quality score.
Dynamic Updates: The score automatically recalculates when the object's null density changes or when the object is profiled again, ensuring it reflects the latest data quality.

The Profile Score provides a valuable indicator of how well the data is populated, offering insights into potential data quality issues.

Child Score

The Child Score assesses the quality of related objects, such as table columns within a table. It plays a role in the overall data quality score only when child objects are present. Here's how it works:

Weightage: When child objects exist, their quality contributes a default weightage of 50% to the data quality score.
No Child Objects: If no child objects are associated with the main object, the child score weightage becomes 0%.
Weight Redistribution: In the absence of child objects, the 50% weightage originally allocated to the child score is redistributed to the Data Quality Rule Score. This increases the weightage of Data Quality Rules from the default 25% to 75%.

The Child Score provides an additional layer of evaluation for objects with hierarchical structures. However, when there are no child objects, the focus shifts entirely to the adherence to data quality rules.

Service Request Score

The Service Request Score plays a crucial role in the Data Quality Score by acting as a reduction factor. It considers the impact of unresolved data quality issues on a specific data object. Here's how it works:

Function: Service requests are created when a Data Quality Rule fails. This can happen automatically based on rule settings or manually by users. These requests essentially serve as tickets, notifying data stewards (responsible parties) about potential data quality issues.
Impact on Score: The Service Request Score has a default negative weightage of 25% (configurable in "System Settings"). This means more unresolved service requests (indicating more outstanding issues) lower the overall data quality score. Conversely, fewer unresolved requests signify better data quality and lead to a less negative impact on the score.

The Service Request Score encourages proactively addressing data quality issues and helps users improve data health by resolving service requests.

Configurations

The weightage of the scores can be configured in “System Settings > Data Quality.”

Type of Score

Parameter

Default Weightage (%)

Default Weightage, if no Child (%)

Rule Score

dq.dashboard.dqrscore.weightage

Profile Score

dq.dashboard.profilescore.weightage

Child Score

dq.dashboard.childscore.weightage

Service Request Score

dq.dashboard.srscore.weightage

The sum of DQ Rule Score, Profile Score, and Child Score should equal 100, and the Service Request Score is the Reduction Factor.

How is the Data Quality Score Calculated?

The Data Quality score is determined by adding the scores of four components:

Data Quality Rule score
Profile Score
Child Score
Service Request Score (Reduction Factor)

Before calculating the Data Quality Score, it is necessary to calculate these scores using the following formulas.

Data Quality Rule Score:

((Passed Rules ÷ Total Rules) × 100) × (Data Quality Rule Weightage ÷ 100) or

((Passed Rows ÷ Total Rows) × 100) × (Data Quality Rule Weightage ÷ 100)

Profile Score:

(100 - Null Density%) * Profile Weightage

Child Score: The Data Quality Score of the associated child objects.

Based on the average DQI index score of the child objects * Child Weightage

Note: If no child objects are present, the weightage (50%) of the child score is added to the DQ rule score weightage (25%). This score is obtained based on the average DQ score of the child objects.

Service Request Score:

[(open tickets / total no of tickets ) * 100 ] * Service Desk Reduction Weightage

Overall Score:

(Data Quality Rule Score + Profile Score + Child Score) - (Service Request Score)

The Formula to calculate the Data Quality Score is illustrated below, along with the corresponding default weightage:

If child objects are present: Profile Score (25%) + Data Quality Rule Score (25%) + Child Score (50%) + Service Request Score (25%)
If child objects are not present: Profile Score (25%) + Data Quality Rule Score (75%) + Service Request Score (25%)

Example

Object Type: Table
Number of Data Quality Rules Associated: 10
Number of Passed Rules: 7
Number of Failed Rules: 3
Number of Open Service Requests: 3
Total Number of Service Requests: 10
Number of Child Objects: 5
Null Density Percentage: 20

Data Quality Rule Score Calculation

((7 ÷ 10) × 100) × (25%) = 17.5

Profile Score Calculation

(100 - 20) × (25%) = 20

Child Score Calculation

(50) × (50%) = 25

Service Request Score Calculation

[( 3 / 7 ) × 100 ] × (25%) = 10.71

Data Quality Score = (17.5 + 20 + 25) - (10.71) = 51.79

Data Quality Dimension Score

Data Quality Dimension Score refers to the quality of data across various dimensions. The specific dimensions and criteria used to calculate the score can vary depending on the type of data being evaluated.

The dimension score is similar to the Data Quality score, but it is calculated at the dimension level.

The formula to calculate the Dimension score is:

(Data Quality Rule Score + Profile Score + Child Score - Service Request Score) for the given Dimension = Data Quality Dimension Score

Note: The Data Dimension Score only considers the Service Requests generated by the application while disregarding those initiated manually by the user (since there is no way to associate the manually created service requests with a dimension).

Viewing Data Quality Score

In the OvalEdge application, a user can access the data quality score of a specific data object from the following modules:

Data Catalog

Navigate to the Data Object > Summary and click the “View Dashboard” option.

The View Scores display the calculations and formulas used to obtain the Overall Score, Data Quality Rules Score, Child Score, Profile Score, and Service Request Score.

Dashboard

Navigate to Dashboard and click on “Data Quality Scores”.

Home View: Provides a high-level overview of data quality scores for various schemas at the connector level.
Donut charts represent the aggregate score for all child objects within a schema.
Tooltips display object names and last updated timestamps when hovering.
Clicking an object name leads to the Tree View for detailed exploration.

Tree View: A granular breakdown of data quality scores for individual elements like tables, columns, files, and file columns. It assesses data along the following dimensions:
Integrity: Checks data security and protection.
Timeliness: Ensures data is up-to-date and relevant.
Uniqueness: Evaluates if records are distinct within the dataset.
Accuracy: Verifies data is error-free and reflects the true value.
Completeness: Determines if all necessary information is present.
Conformity: Checks data alignment with predefined standards.
Validity: Assesses data relevance for the intended use.
Consistency: Ensures internal coherence and adherence to system rules.
Users can select specific tables and files within the Tree View to view detailed data quality scores on the right side.

PreviousConnector & Function Support NextAdditional Features

Last updated 4 months ago

Was this helpful?