Entity Relationships

Building relationships in OvalEdge

To get entity relationships, the user will need to crawl the schema through connector crawler settings by checking the Relationship box, then the user can see the PK (primary key) & FK (foreign key) relations in ER (Entity Relationship) diagrams and tabular view.

After Crawling and profiling the Particular Schema, select the table = to see the relationships, then click on the Entity-Relationships tab.

How are ER diagrams generated (table level)?

An entity relationship displays the logical relationship between a table and other tables in a database. Entity relationships are calculated based on primary key-foreign key relationships i.e. the tables with matching column data.

In OvalEdge, the entity-relationship diagram displayed on the table level displays Tables and other Tables Columns involved in the relationship using flowcharts i.e. rectangles and connecting lines as links between them.

The ER Diagrams of the table are viewed in both the Graphical View and Tabular View. If there are more than five relationships, then a tabular view is used to represent them. Relationships with fewer than five connections are shown in a graphical view.

Importance of Entity Relationships

Some of the key advantages of using Entity Relationships are listed below:

Displays the primary and foreign key relationships.
Helps to identify connections between tables and table columns that are logically related to each other.
Helps to refine table structures, maintain data integrity, accuracy, and minimize redundant data.
Helps to analyze the database to find and resolve problems in logic or deployment.
Easy to implement security methods.

Build Entity Relationships

Different ways to build Entity Relationships in the application:

Relationships

Algorithm Type

Relationships defined at Source

Crawler Settings > select Relationship checkbox

PK-FK relationships are captured from the remote server during the crawling stage.

Advanced Jobs

Discover Primary and Foreign key Recommendations

Primary Key & Foreign Key

Discover relationships automatically

Column Values - Match Count, Base Count, Unique Match Count.

Get Relationships with Column Names

Column Name

Others

Manually

Adjusting relationships manually. Entity Relationships > Tabular View > Adjust Relationships & Calculate Relationships

Query Parsing

The Queries built using JOIN conditions are picked as relations.

Pattern Relationships

Using Pattern-identifying algorithms and adding pattern relationships.

Understand Primary Key & Foreign Key

A Primary Key is a column in a table that contains unique values for each row. Users can think of a primary key as a Social Security number, which is unique and identifies someone even if other people share the same Name or Address.

A Table’s primary key must be unique and cannot be empty or null. To calculate Primary Key, Row count = Distinct count.

For example, in the table below, the customer Id has unique or distinct values without any duplicates or null values, while the First Name, Last Name, and Age columns have duplicate and null values. In fact, in this table, Customer Id is the only column that could be used as the Primary Key.

Foreign Key

A Foreign Key is a column (or set of columns) in a table that has the same set of values as in the Primary key. It links and points to a column with the same data. Notice that the customer ID column values in the Order Details table point to the customer id values in the Customer Details table.

The customer ID column in the Customer Details Table is the Primary Key.
The customer ID column in the Order Details table is the Foreign Key.
The customer ID column in the Books Ordered tables is another Foreign key identified.
The order number column in the Order Details table has references to the order number column in the Books Ordered table.

The Primary Key and Foreign key thus help understand the relationships between tables where the primary key is used as a reference or unique identifier to match Tables containing the same data. An entity (Table) can typically have just one primary key and several relationships with different foreign keys linked with different tables/columns in the database.

Column Relationship

Table relationships are based on the columns in each table that contain the same data. Relationships between tables expose how much of the data from a foreign key column can be seen in the related primary key column and vice versa.

In OvalEdge, users can create, edit, and delete a column relationship manually.

Edit Relationship

To create/ delete a relationship in the relationship window,

Select a column from a primary table and click the Relationship tab to display the relationship window.
Select the database, schema, table, and column name of the secondary table to create a new relationship to the base table and click "Done" to add the new relationship.

To delete a relationship, click the options button and choose the "Delete Relationship" option.

Once a manual relationship is established, the next step is to Calculate the strength of the relationships between these columns.

Pattern Relationship

The Pattern Relationship is one of the key aspects of a data catalog tool. It facilitates discovering the relationship between data points to understand the data better. It also helps identify data assets that have a strong relationship that can be used in various predictive analytic algorithms.

In OvalEdge, to identify such data objects, we have included a pattern match algorithm that will compute a pattern score. There might be several patterns that exist in column data.

Example: A column named credit card number might have a pattern as DDDD-DDDD-DDDD as well as DDDD-DDD-DDDD.

Whenever a similar pattern match is reported, the object is listed and the pattern Match score is computed and displayed. The pattern match score is a percentage score between 0-100 that calculates the similarity between two table-column values. The scores are calculated as a weighted average of the top 50 values.

Pattern match score of each compared column wrt base column =

(Count of rows that have matching patterns to the base column* 50 ) / row count of the secondary table

Example: Assume a secondary table has 909-row count. If there are 122 rows containing pattern DDDDD, Pattern match score=122*50/909

If the Pattern match score is higher, it means more data patterns are matching between the compared data objects, and better the chance of correlation.

When do you see pattern matches?

In OvalEdge, users can generate patterns only between the profiled/analyzed data objects. To generate the patterned relationship, the objects listed should be minimum sample profiled.

At OvalEdge we represent the

Numeric patterns with “D”
Uppercase letters with “U”
Lowercase letters with “L”

Adding patterns to column relationship

Once a user identifies the data objects with a high pattern match score, they can add it to the column relationship, to find how strong or similar the objects are related.

To add a data object to the Relationship,

In Data Catalog >>Table Column, select a table column and click the pattern relationship tab. All the objects with matching patterns are displayed.

To add a column to the relationship tab, select an object, and click the + icon. The selected object is added to the relationship tab.

Refer to Calculating the Relationship Between Columns to know how to calculate the compared column strength.

Deleting the Added Patterns

To delete the added patterns from the Relationship tab, navigate to the column and the Relationship tab and select a column from the list. To delete a relationship, click options and select "Delete Relationship".

The patterns can also be deleted from the Entity relationship tab (tabular view).

Advanced Jobs used to build relationships

1. Get relationships with Column names

This advanced job builds relationships for tables/ columns with identical column names using the Connector ID, Schema ID, Column name, and schema name and on the table catalog, users can view the relationship between the table columns.

Input

Crawler Id/ Database Name, Schema Id/Name, and Recalculate Relationships(True/False).
This advanced job groups all the columns with names matched and having id as a substring in its name in the given schema and builds relations among those columns and the related tables.
If Recalculate Relationships is:
- TRUE: This job calculates scores for all the identified relationships again without checking whether scores are already calculated or not.
- FALSE: This job calculates scores for the identified relationships which do not have scores before.

This job starts calculating scores after building all relationships.

2. Discover relationships automatically

This job gathers all columns from the given schemas’ tables takes the top values of each column and compares them with other column top values and if any top value is matched, it builds relationships between those two tables.

Input

Connector ID / Name.
Schema IDs / Names can be given with comma(,) separated. It is optional.
Build relationships across the schemas(true /false).

By default, it is considered false. (If we give it true it will take across the schema. If you keep empty it is default taken as false).

3. Discover primary and foreign key recommendations auto

It is used to discover all the Primary and Foreign Keys using the formula 'Row Count of a table = Distinct count of a table column.'

To discover the Primary and Foreign Keys, the advanced job first verifies the Row count of the Table, and then the unique values are identified and computed.

On the data column table column summary, you can view the primary key and foreign key information.

Before executing the advanced job, it is essential to verify and adjust the values of the Configuration Keys "pkfk.relation.job.rowcount.check" and "pkfk.relation.job.max.fk.count" to ensure they are set correctly. By default, these values are displayed as 100 and 20, as shown in the below screenshot.

Input

Connector ID/ Name.
Schema IDs/ Names can be given with comma(,) separated. It is optional.
Build relationships across the schemas(true /false).
Delete existing relationships (true/false)

By default, it is considered false. If the attribute is set to true and if any schemas are specified, the job deletes the relationships from such schemas. If no schemas are specified, it deletes the relationships of the given connector's schemas. It deletes the relationships built via this job only, it doesn't touch the other way of relationships. If you keep empty it is taken as false.

If schema IDs are provided as input, this relationship job fetches all primary key columns under the given schema tables where the respective table row count is greater than 100 and the column distinct count is equal to the table row count.
If schema IDs are not provided as input, then the job of this relationship fetches all primary key columns under the available connections on the connected database where the respective table row count is greater than 100 and the column distinct count is equal to the table row count.

Now taking these primary columns as a base, for each primary key column we compare the top 50 values of other columns, if any one top value is matched then we build relationships between those two columns.

Conditions

Schemas must be profiled before running this job.
Only the columns of tables having
1. row_count > 100 and
2. table row count = column distinct count is considered as primary key columns. The difference identified between the 'Discover Relationships Automatically' and 'Discover Primary and Foreign key relationships auto' jobs is that the schemas must be profiled before running 'Project Level Relationships Building'. Both these jobs work on Top value comparison of columns.
Project level Relationships Building

The purpose of this advanced job is to build project-level relationships.

Input

1. Connector ID/ Database name. 2. Schema IDs/ Names can be given with comma(,) separated. It is optional.

To build project-level relationships, we verify whether all the tables added to the project are already fully profiled or not. If not, we initialize profiling jobs for the tables that needed to be fully profiled. After the successful completion of profiling of the top 50 table column values, relationships between columns are built.

We then calculate relationship scores directly by running a query if both the tables are from the same connection (It is an exact calculation), if not from the same connection we do calculate the top 50 values (It is an approximate calculation).

Need to give project name and schema IDs Before running advanced job need that schema should be profiled.

PreviousSearch using Conditional Filter Next9 Dots Action Menu

Last updated 6 months ago

Was this helpful?

hashtagBuilding relationships in OvalEdge

hashtagHow are ER diagrams generated (table level)?

hashtagBuild Entity Relationships

hashtagUnderstand Primary Key & Foreign Key

hashtagColumn Relationship

hashtagEdit Relationship

hashtagPattern Relationship

hashtagAdvanced Jobs used to build relationships

Building relationships in OvalEdge

How are ER diagrams generated (table level)?

Build Entity Relationships

Understand Primary Key & Foreign Key

Column Relationship

Edit Relationship

Pattern Relationship

Advanced Jobs used to build relationships