Apache Airflow

This article outlines the integration with the Apache Airflow connector, enabling streamlined metadata management through features such as crawling and lineage building (Auto & Manual). It also ensures secure authentication via Credential Manager.

Overview

Connector Details

Connector Category

ETL Tool

Connector Version

Release7.2.3

Releases Supported (Available from)

Release5.0

Connectivity

[How the connection is established with Apache Airflow]

REST APIs

Verified Apache Airflow Version

Version V2

The Apache Airflow connector validates with the listed “Verified Apache Airflow Version” and supports other compatible versions. Submit a support ticket for any validation or metadata crawling issues.

Connector Features

Feature

Availability

Crawling

✅

Delta Crawling

Profiling

Query Sheet

Data Preview

Auto Lineage

✅

Manual Lineage

✅

Secure Authentication via Credential Manager

✅

Data Quality

DAM (Data Access Management)

Bridge

✅

'NA' indicates that the respective feature is 'Not Applicable.'

Metadata Mapping

The following objects are crawled from Apache Airflow and mapped to the corresponding UI assets.

Apache Airflow Object

Apache Airflow Attribute

OvalEdge Attribute

OvalEdge Category

OvalEdge Type

Dags

Dag

Code Name

Codes

Airflow_Dag

Tasks

Task

Code Name

Codes

Airflow_Task

Set up a Connection

Prerequisites

The prerequisites to establish a connection:

Service Account User Permissions

It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.

👨‍💻Who can provide these permissions? The Apache Airflow administrator grants these permissions, since users may not have sufficient access to assign them.

Operations

Objects

Access Permissions

Crawling

Schema

Read only

Crawling

Dags

Read only

Crawling

Tasks

Read only

Grant the service account Read permission via the REST APIs.

Connection Configuration Steps

Users must have the Connector Creator role to configure a new connection.

Log in to OvalEdge, go to Administration > Connectors, click + (New Connector), search for Airflow, and complete the required parameters.
Fields marked with an asterisk (*) are mandatory for establishing a connection.

Field Name

Description

Connector Type

By default, "Airflow" is displayed as the selected connector type.

Credential Manager*

Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.

Supported Credential Managers:

OE Credential Manager
AWS Secrets Manager
HashiCorp
Azure Key Vault

License Add Ons

Select the Auto Lineage Add-On checkbox to build data lineage automatically.

Connector Name*

Enter a unique name for the Apache Airflow connection

(Example: "Airflowdb").

Connector Environment

Select the environment (Example: PROD, STG) configured for the connector.

Connector Description

Enter a brief description of the connector.

Server*

Enter the server name or IP address of the Apache Airflow database. (Example: http://airflow-prod-7fxxxx.us-east-1.aws.airflowcloud.com/).

Local DAG path

Specifies whether to use a local file system path for storing DAG files instead of a remote or shared location. Select Yes to Use a local directory for DAG storage and No, to use the default or remote DAG storage location configured in Airflow.

Username*

Enter the username required to access the Apache Airflow Database server (Example: "oesauser").

Password*

Enter the password associated with the provided username to access the Apache Airflow Database.

Proxy Enabled*

It specifies whether the connector should route its requests through a proxy server.

Plugin Server

Enter the server name when running this as a plugin.

Plugin Port

Enter the port number on which plugin is running.

Default Governance Roles

Default Governance Roles*

Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.

Admin Roles

Admin Roles*

Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured in the security settings are available for selection.

No of Archive Objects

No Of Archive Objects*

This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.

Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.

Bridge

Select Bridge*

If applicable, select the bridge from the drop-down list.

The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.

After entering all connection details, the following actions can be performed:
- Click Validate to verify the connection.
- Click Save to store the connection for future use.
- Click Save & Configure to apply additional settings before saving.
The saved connection will appear on the Connectors home page.

Manage Connector Operations

Crawl

To perform crawl operations, users must be assigned the Integration Admin role.

The Crawl/Profile button allows users to select one or more schemas for crawling.

Navigate to the Connectors page and click Crawl/Profile.
Select the schemas to be crawled.
The Crawl option is selected by default.
Click Run to collect metadata from the connected source and load it into the Data Catalog.
After a successful crawl, the information appears in the Data Catalog > Databases/<>Codes tab.

The Schedule checkbox enables automated crawling at intervals ranging from a minute to a year.

Click the Schedule checkbox to enable the Select Period drop-down.
Select a time period for the operation from the drop-down menu.
Click Schedule to initiate metadata collection from the connected source.
The system will automatically execute the crawl operation at the scheduled time.

Other Operations

The Connectors page provides a centralized view of all configured connectors and their health status.

Managing connectors includes:

Connector Health: Displays the current status of each connector, with a green icon for active connections and a red icon for inactive connections, helping monitor connectivity to data sources.
Viewing: Click the Eye icon next to the connector name to view connector details, including Codes, Schemas, and Tasks.

Nine Dots Menu Options:

To view, edit, validate, build lineage, configure, or delete connectors, click on the Nine Dots menu.

Edit Connector: Update and revalidate the data source.
Validate Connector: Check the integrity of the connection.
Settings: Modify connector settings.
- Lineage: Select server dialects for parsing and setting connector priority for table lineage.
Build Lineage: Automatically build data lineage using source code parsing.
Delete Connector: Remove a connector with confirmation.

Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

S.No.

Error Message(s)

Error Description & Resolution

Crawling is a mandatory step before building a lineage.

Error Description: This error occurs when a lineage build is initiated without performing the required crawling step to collect metadata. Resolution: First, run the crawling process for the dataset or source, then proceed to build the lineage.

PreviousTalend NextMatillion

Last updated 1 month ago

Was this helpful?