Ways of Building Auto Lineage

Users can select source codes from the list and build auto lineage by selecting the Nine Dots menu. It provides options to meet specific requirements, making the process flexible. Here is an overview of these options with context:

RDBMS Connectors

Reporting and ETL Connectors

Build Lineage for Selected Codes

This option allows the user to build lineage only on the selected codes.

Build Lineage for New or Changed Codes

This option allows the user to build lineage for the new codes, or if the source has modified codes after the recrawling.

Build Lineage for All Codes

This option initiates the process of lineage building for all available codes within the selected connector, giving users a complete view of the data lineage.

Fetch Code Log and Build Lineage

The platform offers a special lineage-building option for Oracle, SQL Server, PostgreSQL, MySQL, and Snowflake using the code log. This enables users to build lineage using query logs of the source system without crawling the connection again. Users can tailor the selection of the logs as per their needs. The query log setting must be done during connection creation to build the lineage using this option.

Re-build Summary Lineage for Selected Source Code

This option allows the user to rebuild the Summary Lineage only for the selected source code. It updates the simplified data flow view for the selected source without affecting other sources in the instance. This option applies only to reporting and ETL connectors.

Re-build Summary Lineage for All Source Code

This option rebuilds the Summary Lineage for all available source codes in the instance. It refreshes the simplified data flow view across every source, ensuring that the Summary Lineage reflects the latest available information. This option applies only to reporting and ETL connectors.

Query Log Settings

Data crawling can be achieved in two ways: through the crawler and via the query log. Queries play a crucial role in building data lineage and relationships.

With Query Log Settings, They can select the query type and the lookback period. However, the query text is predetermined based on the connector, although users can customize it to meet their specific requirements. Additionally, users can specify the schema from which they want to retrieve the query and exclude specific users. The objective of query log settings is to retrieve the query from the data source, process the queries, and create data lineage for (INSERT) for the specified duration.

Various options to configure query log settings include

Query Log Settings

Description

Query

Displays the queries that are used to retrieve the Query Logs from the schema

Look back Period

This feature allows users to retrieve queries that were processed in the datasource a certain number of days ago. For example, the lookback period can be set to 1 day, two days, or more. The maximum Look back period is seven days.

Include Query Type

Users can include query types such as Select, Insert, Update, and Delete

Exclude Users

In the Query Logs, users have the option to exclude unnecessary users. If a username is specified in the excluded user field, the query log for queries executed by that particular user is excluded from retrieval.

Schema

The user has the option to select a specific schema to apply query log settings. If a user leaves this blank, it will fetch the query log based on all the schema available for that database connection.

Export Source Code to File

In certain instances, the parsing of the source code fails, and the lineage is not built. OvalEdge allows users to export selected source codes in a .zip format (password-protected for security). A click starts a job submission, and the file is quickly downloaded. Share this file with the analysis team for further processing, research, and curation in the dedicated lab environment.

The user must set the value to "False" in “is.ovaledge.lab” (System Settings).

Import Source Code from File

Import source code allows the user to import the corrected source code in JSON format to build the lineage. It usually works under the export functionality, and the user uploads the JSON file from the dedicated lab environment.

The user must set the value to “True” in “is.ovaledge.lab” (System Settings).

Viewing Source Codes

It allows users to access important information for each query on the "Build Auto Lineage" page after they submit and execute a job. This feature gives users detailed insights into creating their data lineage. Here's a close look at the key information for each query.

Schema Name: This field shows the code schema in the connector. It helps understand the code's structure and context, ensuring clarity in the data lineage.

Code Name: The "Code Name" section shows the name of the code. This helps users identify and refer to the code in their lineage quickly.

Lineage Status: Lineage Status" indicates the data lineage progress. It shows the current status of the query in the application. Various statuses are used to communicate the query's progress, making the lineage creation process transparent and clear. The statuses include:

Status

Parse

Lineage Dashboard

Table Type

SUCCESS_LINEAGE_BUILD

100%

YES

Real Table

SUCCESS_WITH_TEMP

100%

YES

Temp Table

SUCCESS_LINEAGE_DOES_NOT_EXIST

100%

–

SUCCESS_LINEAGE_FIXED

100%

YES

Real Table

SUCCESS_LINEAGE_PARTIALLY_BUILD

<100%

YES

Real Table

SUCCESS_LINEAGE_PARTIALLY_BUILDWITH_TEMP

<100%

YES

Real Table + Temp Table

PARSE_FAILED

FAILED

–

FAILED_UNSUPPORTED_QUERY

FAILED

–

SUCCESS_LINEAGE_BUILD_WITH_OvalEdgeLAB

100%

YES

Real Table

SUCCESS_MANUALLY_BUILD_WITH_OvalEdgeLAB

100%

YES

Real Table

SUCCESS_NO LINEAGE BUT ASSOCIATION

100%

Correct Query

OvalEdge helps users fix issues with the "Correct Query" option. Author users with Metawrite permission can see the query by clicking the ‘Eye’ icon or correct the query by clicking the tick mark. This is useful for addressing parsing problems or improving the success of the lineage creation process.

View Code: Users can easily check the code by clicking the ‘Eye’ icon in the "Correct Query" column on the "Build Auto Lineage" page. This shows what's in the code and helps them understand data lineage creation.
Correct Query: The "Correct Query" option allows users to fix code issues, make changes, and revalidate the code, ensuring accurate and reliable data lineage. This feature is crucial for managing errors, adapting to changes, and maintaining a robust data lineage system.

In the application, users can improve and handle their code using the "Correct Query Screen" feature.

Operations in Correct Query

Compare Screens:
1. Shows a side-by-side comparison.
2. The left screen shows the original query.
3. The right screen is a blank workspace for code changes, where users can copy the existing code and make changes as per the requirements.
Reset Code: The "Reset" option restores the query to the previous version. It undoes validation changes, allowing users to make further adjustments or fix parsing problems.
Copy Query: Users can copy the original code from the left window and place it in the right window to edit and build lineage.
Replace: The "Replace" feature lets users replace incorrect strings with correct ones throughout the code with a single click. It will be applied to all the connectors' source codes.
Source Server Type: Users can specify or change the type of source server from which the data originates. It ensures the ETL process applies the correct parsing rules and connection configurations based on the selected server type.
Format Query: The "Format Query" operation aligns the query into a structured format for better readability, understanding, and smoother curation.
Validate: The "Validate" button lets users check the correctness of their edited query, ensuring it follows the required syntax and structure.
Save & Build Lineage: After validations, click the Save & Build Lineage option to save the edited query and start the lineage-building process.
Result: This section shows users the outcome of the query parsing process. It displays parsing results and associated logs, giving a clear overview of the code's status.
Build Lineage using Shutter box: The "Shutter box" feature in OvalEdge allows users to build lineage in various ways:

Lineage: Users can directly select source and destination objects from dropdown menus to build lineage, simplifying data flow definition.
Relationships: Users can select schemas, tables, and columns to establish connections between data elements.
Associations: Users can select objects to associate with queries, creating meaningful connections between data elements.

Error Message: When a query can't parse and shows "Parse failed," the "Error Message" section helps. It gives users the error log, showing specific issues to fix. This information allows users to solve parsing errors.

Viewing Lineage on Code

After successfully building lineage, users can access the queries they worked on. Users can view the lineage for these queries in the Data Catalog > Codes section.

Code Summary Tab:

This section provides a detailed summary of the query. It includes a business description, a technical description, associated business glossary terms, tags, the top users who have interacted with the query, the last crawled date, and the lineage status.

Associations:

The "Associations" tab lists objects directly associated with the query. Understanding these connections is crucial for grasping the query's impact on other data elements in the system.

References:

In the "References" tab, users can see a list of objects that refer to the root object. This information helps users understand where and how the object is used across the system.

Lineage:

In the "Lineage" tab, users can see how the lineage was built. This includes two viewing options:

Graphical
Tabular

Exploring lineage helps users understand the data's journey thoroughly. For more detailed insights into each field, users can refer to the Data Catalog Deep Dive Article.

Schedule

The lineage process can be scheduled in advance for user convenience. To schedule it, navigate to Administration → Job Workflow. Lineage can also be scheduled for new processes or changed queries. For more details on the Schedule, please refer to the schedule section.

Advanced Job

Refresh Has-Lineage: This job refreshes the Has Lineage column in the Data Catalog list page. The column indicates whether the object has lineage or not.

Build Dashboard stats: This job will populate the data into lineage dashboard after migrations or fresh deployments.

For more details on the Advanced Job, please refer to the Advanced Jobs.

PreviousBuild Auto Lineage NextCompare Schemas

Last updated 4 months ago

Was this helpful?

hashtagRDBMS Connectors

hashtagReporting and ETL Connectors

hashtagBuild Lineage for Selected Codes

hashtagBuild Lineage for New or Changed Codes

hashtagBuild Lineage for All Codes

hashtagFetch Code Log and Build Lineage

hashtagRe-build Summary Lineage for Selected Source Code

hashtagRe-build Summary Lineage for All Source Code

hashtagQuery Log Settings

hashtagExport Source Code to File

hashtagImport Source Code from File

hashtagViewing Source Codes

hashtagCorrect Query

hashtagOperations in Correct Query

hashtagViewing Lineage on Code

hashtagCode Summary Tab:

hashtagAssociations:

hashtagReferences:

hashtagLineage:

hashtagSchedule

hashtagAdvanced Job

RDBMS Connectors

Reporting and ETL Connectors

Build Lineage for Selected Codes

Build Lineage for New or Changed Codes

Build Lineage for All Codes

Fetch Code Log and Build Lineage

Re-build Summary Lineage for Selected Source Code

Re-build Summary Lineage for All Source Code

Query Log Settings

Export Source Code to File

Import Source Code from File

Viewing Source Codes

Correct Query

Operations in Correct Query

Viewing Lineage on Code

Code Summary Tab:

Associations:

References:

Lineage:

Schedule

Advanced Job