Ways of Building Auto Lineage

Users can select source codes from the list and build auto lineage by selecting the Nine Dots menu. It provides options to meet specific requirements, making the process flexible. Here is an overview of these options with context:

RDBMS Connectors

Reporting and ETL Connectors

Build Lineage for Selected Codes

This option allows the user to build lineage only on the selected codes.

Build Lineage for New or Changed Codes

This option allows the user to build lineage for the new codes, or if the source has modified codes after the recrawling.

Build Lineage for All Codes

This option initiates the process of lineage building for all available codes within the selected connector, giving users a complete view of the data lineage.

Fetch Code Log and Build Lineage

The platform offers a special lineage-building option for Oracle, SQL Server, PostgreSQL, MySQL, and Snowflake using the code log. This enables users to build lineage using query logs of the source system without crawling the connection again. Users can tailor the selection of the logs as per their needs. The query log setting must be done during connection creation to build the lineage using this option.

Re-build Summary Lineage for Selected Source Code

This option allows the user to rebuild the Summary Lineage only for the selected source code. It updates the simplified data flow view for the selected source without affecting other sources in the instance. This option applies only to reporting and ETL connectors.

Re-build Summary Lineage for All Source Code

This option rebuilds the Summary Lineage for all available source codes in the instance. It refreshes the simplified data flow view across every source, ensuring that the Summary Lineage reflects the latest available information. This option applies only to reporting and ETL connectors.

Query Log Settings

Data crawling can be achieved in two ways: through the crawler and via the query log. Queries play a crucial role in building data lineage and relationships.

With Query Log Settings, They can select the query type and the lookback period. However, the query text is predetermined based on the connector, although users can customize it to meet their specific requirements. Additionally, users can specify the schema from which they want to retrieve the query and exclude specific users. The objective of query log settings is to retrieve the query from the data source, process the queries, and create data lineage for (INSERT) for the specified duration.

Various options to configure query log settings include

Query Log Settings
Description

Query

Displays the queries that are used to retrieve the Query Logs from the schema

Look back Period

This feature allows users to retrieve queries that were processed in the datasource a certain number of days ago. For example, the lookback period can be set to 1 day, two days, or more. The maximum Look back period is seven days.

Include Query Type

Users can include query types such as Select, Insert, Update, and Delete

Exclude Users

In the Query Logs, users have the option to exclude unnecessary users. If a username is specified in the excluded user field, the query log for queries executed by that particular user is excluded from retrieval.

Schema

The user has the option to select a specific schema to apply query log settings. If a user leaves this blank, it will fetch the query log based on all the schema available for that database connection.

Export Source Code to File

In certain instances, the parsing of the source code fails, and the lineage is not built. OvalEdge allows users to export selected source codes in a .zip format (password-protected for security). A click starts a job submission, and the file is quickly downloaded. Share this file with the analysis team for further processing, research, and curation in the dedicated lab environment.

The user must set the value to "False" in “is.ovaledge.lab” (System Settings).

Import Source Code from File

Import source code allows the user to import the corrected source code in JSON format to build the lineage. It usually works under the export functionality, and the user uploads the JSON file from the dedicated lab environment.

The user must set the value to “True” in “is.ovaledge.lab” (System Settings).

Viewing Source Codes

It allows users to access important information for each query on the "Build Auto Lineage" page after they submit and execute a job. This feature gives users detailed insights into creating their data lineage. Here's a close look at the key information for each query.

Schema Name: This field shows the code schema in the connector. It helps understand the code's structure and context, ensuring clarity in the data lineage.

Code Name: The "Code Name" section shows the name of the code. This helps users identify and refer to the code in their lineage quickly.

Lineage Status: Lineage Status" indicates the data lineage progress. It shows the current status of the query in the application. Various statuses are used to communicate the query's progress, making the lineage creation process transparent and clear. The statuses include:

Status
Parse
Lineage Dashboard
Table Type

SUCCESS_LINEAGE_BUILD

100%

YES

Real Table

SUCCESS_WITH_TEMP

100%

YES

Temp Table

SUCCESS_LINEAGE_DOES_NOT_EXIST

100%

NO

SUCCESS_LINEAGE_FIXED

100%

YES

Real Table

SUCCESS_LINEAGE_PARTIALLY_BUILD

<100%

YES

Real Table

SUCCESS_LINEAGE_PARTIALLY_BUILDWITH_TEMP

<100%

YES

Real Table + Temp Table

PARSE_FAILED

FAILED

FAILED_UNSUPPORTED_QUERY

FAILED

SUCCESS_LINEAGE_BUILD_WITH_OvalEdgeLAB

100%

YES

Real Table

SUCCESS_MANUALLY_BUILD_WITH_OvalEdgeLAB

100%

YES

Real Table

SUCCESS_NO LINEAGE BUT ASSOCIATION

100%

NO

-

Correct Query

OvalEdge helps users fix issues with the "Correct Query" option. Author users with Metawrite permission can see the query by clicking the ‘Eye’ icon or correct the query by clicking the tick mark. This is useful for addressing parsing problems or improving the success of the lineage creation process.

  1. View Code: Users can easily check the code by clicking the ‘Eye’ icon in the "Correct Query" column on the "Build Auto Lineage" page. This shows what's in the code and helps them understand data lineage creation.

  2. Correct Query: The "Correct Query" option allows users to fix code issues, make changes, and revalidate the code, ensuring accurate and reliable data lineage. This feature is crucial for managing errors, adapting to changes, and maintaining a robust data lineage system.

In the application, users can improve and handle their code using the "Correct Query Screen" feature.

Operations in Correct Query

  1. Compare Screens:

    1. Shows a side-by-side comparison.

    2. The left screen shows the original query.

    3. The right screen is a blank workspace for code changes, where users can copy the existing code and make changes as per the requirements.

  2. Reset Code: The "Reset" option restores the query to the previous version. It undoes validation changes, allowing users to make further adjustments or fix parsing problems.

  3. Copy Query: Users can copy the original code from the left window and place it in the right window to edit and build lineage.

  4. Replace: The "Replace" feature lets users replace incorrect strings with correct ones throughout the code with a single click. It will be applied to all the connectors' source codes.

  5. Source Server Type: Users can specify or change the type of source server from which the data originates. It ensures the ETL process applies the correct parsing rules and connection configurations based on the selected server type.

  6. Format Query: The "Format Query" operation aligns the query into a structured format for better readability, understanding, and smoother curation.

  7. Validate: The "Validate" button lets users check the correctness of their edited query, ensuring it follows the required syntax and structure.

  8. Save & Build Lineage: After validations, click the Save & Build Lineage option to save the edited query and start the lineage-building process.

  9. Result: This section shows users the outcome of the query parsing process. It displays parsing results and associated logs, giving a clear overview of the code's status.

  10. Build Lineage using Shutter box: The "Shutter box" feature in OvalEdge allows users to build lineage in various ways:

  • Lineage: Users can directly select source and destination objects from dropdown menus to build lineage, simplifying data flow definition.

  • Relationships: Users can select schemas, tables, and columns to establish connections between data elements.

  • Associations: Users can select objects to associate with queries, creating meaningful connections between data elements.

Error Message: When a query can't parse and shows "Parse failed," the "Error Message" section helps. It gives users the error log, showing specific issues to fix. This information allows users to solve parsing errors.

Viewing Lineage on Code

After successfully building lineage, users can access the queries they worked on. Users can view the lineage for these queries in the Data Catalog > Codes section.

Code Summary Tab:

This section provides a detailed summary of the query. It includes a business description, a technical description, associated business glossary terms, tags, the top users who have interacted with the query, the last crawled date, and the lineage status.

Associations:

The "Associations" tab lists objects directly associated with the query. Understanding these connections is crucial for grasping the query's impact on other data elements in the system.

References:

In the "References" tab, users can see a list of objects that refer to the root object. This information helps users understand where and how the object is used across the system.

Lineage:

In the "Lineage" tab, users can see how the lineage was built. This includes two viewing options:

  • Graphical

  • Tabular

AI-Augmented Lineage

AI-Augmented Lineage enhances the lineage-building process by extending the existing deterministic parsing framework with AI-driven intelligence. This capability improves lineage coverage for complex transformation logic, reduces dependency on manual intervention, and accelerates support for new or evolving data patterns.

The feature operates within the existing lineage architecture and integrates seamlessly into the Auto Lineage workflow without altering the core lineage-building mechanisms.

AI-Augmented Parsing Behavior

During lineage construction, the application continues to execute standard deterministic parsing for all supported patterns. When the parser encounters unsupported, complex, or unrecognized logic, the system automatically invokes AI-based parsing to infer lineage.

This fallback-based approach ensures:

  • Continuity of lineage generation without failure

  • Extended coverage for complex transformation logic

  • Preservation of reliability from deterministic parsing while augmenting it with AI capabilities

The AI-based parsing operates only when required and does not override existing parser results.

Handling Complex Transformations

AI-Augmented Lineage improves extraction for scenarios that are traditionally difficult to process using rule-based parsing, including:

  • Nested queries and multi-layer transformations

  • Complex scripting logic (e.g., M-code in reporting tools)

  • Non-standard or evolving query patterns

This ensures improved completeness of lineage across data pipelines, especially in reporting and ETL systems.

Confidence-Based Lineage Processing

All AI-inferred lineage is evaluated using a confidence scoring mechanism.

  • High-confidence lineage results are automatically merged into the lineage graph

  • Lower-confidence results are not directly applied and are flagged for review

Flagged lineage appears as pending lineage, allowing users to validate and accept it before inclusion. This ensures accuracy while maintaining governance control through a human-in-the-loop approach.

Data Security through Pseudonymization

Before processing any source code through AI models, sensitive information is protected using pseudonymization.

  • Identifiers such as database names, schema names, table names, and credentials are replaced with neutral tokens

  • AI processing is performed on masked data

  • After lineage extraction, the system restores original values through a controlled de-pseudonymization process

This ensures that no sensitive data is exposed externally while maintaining accurate lineage reconstruction.

Zero-Deployment Template Management

AI-Augmented Lineage supports a zero-deployment enhancement model.

Administrators can update:

  • Parsing templates (code templates)

  • AI prompt templates

These templates are version-controlled and stored within the application. Updates can be applied without requiring application redeployment, enabling faster adaptation to new transformation patterns and reducing turnaround time for lineage improvements.

Feature Enablement and Configuration

AI-Augmented Lineage is controlled through both global and connector-level configurations.

  • A system-level setting enables AI-based lineage capabilities across the instance

  • Connector-level configuration allows selective enablement under the AI Lineage settings

This enables phased adoption and controlled rollout across different data sources.

Supported Connectors

AI-Augmented Lineage is currently supported for:

  • RDBMS connectors

  • Reporting connectors such as Power BI (with a focus on complex transformation logic)

Support is continuously expanding to additional connectors and transformation frameworks.

Integration with Auto Lineage

AI-Augmented Lineage is integrated into the existing Build Auto Lineage workflow.

  • AI parsing is triggered automatically during lineage job execution when parsing gaps are identified

  • Confidence scoring is applied as part of lineage processing

  • Results are reflected within lineage status and review workflows

Impact on Lineage Visualization

AI-generated lineage integrates into existing lineage views:

  • Accepted lineage becomes part of the standard lineage graph

  • Pending lineage can be reviewed and validated before inclusion

  • Lineage visualization continues to support both Summary and Detailed views without structural changes

Exploring lineage helps users understand the data's journey thoroughly. For more detailed insights into each field, users can refer to the Data Catalog Deep Dive Article.

Schedule

The lineage process can be scheduled in advance for user convenience. To schedule it, navigate to Administration → Job Workflow. Lineage can also be scheduled for new processes or changed queries. For more details on the Schedule, please refer to the schedule section.

Advanced Job

Refresh Has-Lineage: This job refreshes the Has Lineage column in the Data Catalog list page. The column indicates whether the object has lineage or not.

Build Dashboard stats: This job will populate the data into lineage dashboard after migrations or fresh deployments.

For more details on the Advanced Job, please refer to the Advanced Jobs.


Copyright © 2025, OvalEdge LLC, Peachtree Corners, GA, USA.

Last updated

Was this helpful?