# Apache Airflow

This article outlines the integration with the Apache Airflow connector, enabling streamlined metadata management through features such as crawling and lineage building (Auto & Manual). It also ensures secure authentication via Credential Manager.

<figure><img src="https://1813356899-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FhTnkoJQml0pok9awFDhx%2Fuploads%2FlxupXFhpcdJ7xRZNGU35%2Funknown.png?alt=media&#x26;token=cf5d814a-3d43-4cf5-b55b-de9417629a62" alt=""><figcaption></figcaption></figure>

## **Overview**

### **Connector Details**

| Connector Category                                                                 | ETL Tool     |
| ---------------------------------------------------------------------------------- | ------------ |
| Connector Version                                                                  | Release7.2.3 |
| Releases Supported (Available from)                                                | Release5.0   |
| <p>Connectivity</p><p>\[How the connection is established with Apache Airflow]</p> | REST APIs    |
| Verified Apache Airflow Version                                                    | Version V2   |

{% hint style="info" %}
The Apache Airflow connector validates with the listed “Verified Apache Airflow Version” and supports other compatible versions. Submit a support ticket for any validation or metadata crawling issues.
{% endhint %}

### **Connector Features**

<table><thead><tr><th>Feature</th><th align="center" valign="top">Availability</th></tr></thead><tbody><tr><td>Crawling</td><td align="center" valign="top">✅</td></tr><tr><td>Delta Crawling</td><td align="center" valign="top">NA</td></tr><tr><td>Profiling</td><td align="center" valign="top">NA</td></tr><tr><td>Query Sheet</td><td align="center" valign="top">NA</td></tr><tr><td>Data Preview</td><td align="center" valign="top">NA</td></tr><tr><td>Auto Lineage</td><td align="center" valign="top">✅</td></tr><tr><td>Manual Lineage</td><td align="center" valign="top">✅</td></tr><tr><td>Secure Authentication via Credential Manager</td><td align="center" valign="top">✅</td></tr><tr><td>Data Quality</td><td align="center" valign="top">NA</td></tr><tr><td>DAM (Data Access Management)</td><td align="center" valign="top">NA</td></tr><tr><td>Bridge</td><td align="center" valign="top">✅</td></tr></tbody></table>

{% hint style="info" %}
'NA' indicates that the respective feature is 'Not Applicable.'
{% endhint %}

### Metadata Mapping

The following objects are crawled from Apache Airflow and mapped to the corresponding UI assets.

<table><thead><tr><th width="197.6666259765625">Apache Airflow Object</th><th width="212.666748046875">Apache Airflow Attribute</th><th width="180">OvalEdge Attribute</th><th width="180.77783203125">OvalEdge Category</th><th>OvalEdge Type</th></tr></thead><tbody><tr><td>Dags</td><td>Dag</td><td>Code Name</td><td>Codes</td><td>Airflow_Dag</td></tr><tr><td>Tasks</td><td>Task</td><td>Code Name</td><td>Codes</td><td>Airflow_Task</td></tr></tbody></table>

## Set up a Connection

### Prerequisites

The prerequisites to establish a connection:

### **Service Account User Permissions**

{% hint style="warning" %}
It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.
{% endhint %}

{% hint style="info" %}
**👨‍💻Who can provide these permissions?** The Apache Airflow administrator grants these permissions, since users may not have sufficient access to assign them.
{% endhint %}

| Operations | Objects | Access Permissions |
| ---------- | ------- | ------------------ |
| Crawling   | Schema  | Read only          |
| Crawling   | Dags    | Read only          |
| Crawling   | Tasks   | Read only          |

{% hint style="info" %}
Grant the service account Read permission via the REST APIs.
{% endhint %}

## Connection Configuration Steps

{% hint style="warning" %}
Users must have the Connector Creator role to configure a new connection.
{% endhint %}

1. Log in to OvalEdge, go to Administration > Connectors, click + **(New Connector)**, search for **Airflow**, and complete the required parameters.

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>Fields marked with an asterisk (*) are mandatory for establishing a connection.</p></div>

<table><thead><tr><th width="220.666748046875">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Connector Type</td><td>By default, "Airflow" is displayed as the selected connector type.</td></tr><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td><p></p><ul><li>Select the Auto Lineage Add-On checkbox to build data lineage automatically.</li></ul></td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Apache Airflow connection              </p><p>(Example: "Airflowdb").</p></td></tr><tr><td>Connector Environment</td><td>Select the environment (Example: PROD, STG) configured for the connector.</td></tr><tr><td>Connector Description</td><td>Enter a brief description of the connector.</td></tr><tr><td>Server*</td><td>Enter the server name or IP address of the Apache Airflow database. (Example: http://airflow-prod-7fxxxx.us-east-1.aws.airflowcloud.com/).</td></tr><tr><td>Local DAG path</td><td>Specifies whether to use a local file system path for storing DAG files instead of a remote or shared location.<br><br>Select Yes to Use a local directory for DAG storage and<br>No, to use the default or remote DAG storage location configured in Airflow.</td></tr><tr><td>Username*</td><td>Enter the username required to access the Apache Airflow Database server (Example: "oesauser").</td></tr><tr><td>Password*</td><td>Enter the password associated with the provided username to access the Apache Airflow Database.</td></tr><tr><td>Proxy Enabled*</td><td>It specifies whether the connector should route its requests through a proxy server.</td></tr><tr><td>Plugin Server</td><td>Enter the server name when running this as a plugin.</td></tr><tr><td>Plugin Port</td><td>Enter the port number on which plugin is running.</td></tr></tbody></table>

**Default Governance Roles**

<table data-header-hidden><thead><tr><th width="220.6666259765625"></th><th></th></tr></thead><tbody><tr><td>Default Governance Roles*</td><td>Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**Admin Roles**

<table data-header-hidden><thead><tr><th width="220.6666259765625"></th><th></th></tr></thead><tbody><tr><td>Admin Roles*</td><td>Select one or more users from the dropdown list for Integration Admin and Security &#x26; Governance Admin. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**No of Archive Objects**

<table data-header-hidden><thead><tr><th width="220.6666259765625"></th><th></th></tr></thead><tbody><tr><td>No Of Archive Objects*</td><td><p>This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.</p><p>Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.</p></td></tr></tbody></table>

**Bridge**

<table data-header-hidden><thead><tr><th width="220.6666259765625"></th><th></th></tr></thead><tbody><tr><td>Select Bridge*</td><td><p>If applicable, select the bridge from the drop-down list.</p><p>The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.</p></td></tr></tbody></table>

2. After entering all connection details, the following actions can be performed:
   * Click **Validate** to verify the connection.
   * Click **Save** to store the connection for future use.
   * Click **Save & Configure** to apply additional settings before saving.
3. The saved connection will appear on the Connectors home page.

## Manage Connector Operations

### Crawl

{% hint style="warning" %}
To perform crawl operations, users must be assigned the Integration Admin role.
{% endhint %}

The **Crawl/Profile** button allows users to select one or more schemas for crawling.

1. Navigate to the Connectors page and click **Crawl/Profile**.
2. Select the schemas to be crawled.
3. The **Crawl** option is selected by default.
4. Click **Run** to collect metadata from the connected source and load it into the **Data Catalog**.
5. After a successful crawl, the information appears in the **Data Catalog > Databases/<>Codes** tab.

The **Schedule** checkbox enables automated crawling at intervals ranging from a minute to a year.

1. Click the **Schedule** checkbox to enable the **Select Period** drop-down.
2. Select a time period for the operation from the drop-down menu.
3. Click **Schedule** to initiate metadata collection from the connected source.
4. The system will automatically execute the **crawl** operation at the scheduled time.

### Other Operations

The Connectors page provides a centralized view of all configured connectors and their health status.

**Managing connectors includes:**

* **Connector Health**: Displays the current status of each connector, with a green icon for active connections and a red icon for inactive connections, helping monitor connectivity to data sources.
* **Viewing**: Click the Eye icon next to the connector name to view connector details, including Codes, Schemas, and Tasks.

**Nine Dots Menu Options:**

To view, edit, validate, build lineage, configure, or delete connectors, click on the Nine Dots menu.

* **Edit Connector**: Update and revalidate the data source.
* **Validate Connector**: Check the integrity of the connection.
* **Settings**: Modify connector settings.
  * **Lineage**: Select server dialects for parsing and setting connector priority for table lineage.
* **Build Lineage**: Automatically build data lineage using source code parsing.
* **Delete Connector**: Remove a connector with confirmation.

### Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

<table><thead><tr><th width="86.666748046875">S.No.</th><th width="210.33343505859375">Error Message(s)</th><th>Error Description &#x26; Resolution</th></tr></thead><tbody><tr><td>1</td><td>Crawling is a mandatory step before building a lineage.</td><td>Error Description: This error occurs when a lineage build is initiated without performing the required crawling step to collect metadata.<br><br>Resolution: First, run the crawling process for the dataset or source, then proceed to build the lineage.</td></tr></tbody></table>

***

Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
