Importance Score Calculation

This article explains the importance of score calculations for Profiling, Lineage (Downstream), and Relationships.

The importance score for Lineage and Profiling (Downstream) applies to all Data Catalog objects, but the importance score for Relationships can only be calculated for Tables and Table Columns.

Table level

The importance score will increase by three conditions.

Profiling
Lineage (Downstream)
Relationships

The final importance score is an addition of the three above parameters.

Profiling If we profile the table, the importance score will increase automatically based on the calculation below. 200 x (No. of rows in the table/Max row count in the schema) + no.of columns (0.1)
Lineage (Downstream)
1. The importance score increases based on the downstream lineage (Destination lineage) in the table.
2. For each downstream object, it will increase by 7.
3. If we delete the Object from downstream (Destination lineage), it will decrease by 7.
Relationships Importance Score increases based on the Relationships in the table.
1. For each relationship, it will increase by 3
2. For linked objects, it will increase by 3 for both Table and Table Columns.
3. If we delete any relationships, the importance score will decrease by 3.
4. If we have Primary Key (PK) and Foreign Key (FK) relations for the table, the importance score will increase based on the relationships count.
5. The importance score will increase if the relationship is added through any medium (manual, crawl, advanced jobs, pattern).

Table Column level

Profiling If we profile one table, columns also get profiled automatically. Calculation: 100 x (column distinct count/table row count)
Lineage (Downstream) and Relationships The table columns also follow the same calculation as the tables.
Relationships The table columns also follow the same calculation as the tables.

Importance score calculation example for Table

Let’s consider a Table (Customer_Sales) with 100 rows. Let’s assume it has 20 columns. It has 3 downstream objects and 4 relationships. The highest or maximum row count of any table in that Schema is 1000. Now, let’s calculate the Profiling, Lineage, Relationships, and overall importance score.

The importance score for the Table (Customer_Sales) is the addition of all three parameters (Profiling, Lineage, and Relationships).

Profiling: 200 x (100/1000) + (20 x 0.1) = 22
Lineage (Downstream): 3 x 7 = 21
Relationships: 4 x 3 = 12

So, the importance score for the Table (Customer_Sales) is 22 + 21 + 12 = 55

Business Use Cases for Importance Score

Prioritizing Curation: Focus curation efforts on tables and columns with the highest importance scores to enhance data literacy and usability. This ensures that critical datasets are well-documented, accurate, and accessible to business users.
Impact Analysis: Leverage importance scores to evaluate downstream risks before implementing schema or data changes. This helps mitigate potential disruptions to dependent processes or applications.
Informed Decisions: Identify and highlight pivotal datasets with high-importance scores to drive accurate, data-informed decisions. Such datasets are essential for key analyses, reporting, and strategic planning.
Policy Alignment: Use importance scores to apply governance policies effectively to datasets with extensive business usage. High-score datasets can be prioritized for compliance, security, and quality controls.
Team Collaboration: Foster collaboration by highlighting datasets with shared business significance and high-importance scores. This encourages cross-functional teams to engage with the most impactful data for their initiatives.

PreviousAdditional Information NextConfiguring Views

Last updated 4 months ago

Was this helpful?