# Azure Data Lake

This article outlines the integration with Azure Data Lake connector, enabling streamlined metadata management through features such as crawling, and data preview. It also ensures secure authentication via Credential Manager.

<figure><img src="/files/tHR01AxkIMF2JSsvCi4a" alt=""><figcaption></figcaption></figure>

### Overview

#### Connector Details

| Connector Category                                                                  | Cloud Storage              |
| ----------------------------------------------------------------------------------- | -------------------------- |
| OvalEdge Releases Supported                                                         | Release5.0 to Release7.1.1 |
| <p>Connectivity</p><p>\[How the connection is established with Azure Data Lake]</p> | ADL SDK                    |

#### Connector Features

| Feature                                      | Availability |
| -------------------------------------------- | :----------: |
| Crawling                                     |       ✅      |
| Delta Crawling                               |       ❌      |
| Profiling\*                                  |       ❌      |
| Sample Profiling                             |       ✅      |
| Query Sheet                                  |       ❌      |
| Data Preview                                 |       ✅      |
| Auto Lineage                                 |       ❌      |
| Manual Lineage                               |       ✅      |
| Secure Authentication via Credential Manager |       ✅      |
| Data Quality                                 |       ✅      |
| DAM (Data Access Management)                 |       ❌      |
| Bridge                                       |       ✅      |

{% hint style="info" %}
\*Full profiling is supported through DuckDB. To enable this capability, configure the system setting (key: enable.duckdb) to **True**.
{% endhint %}

#### Metadata Mapping

The following objects are crawled from Azure Data Lake and mapped to the corresponding UI assets.

<table><thead><tr><th width="200.4000244140625">Azure Data Lake Object</th><th width="213.79998779296875">Azure Data Lake Attribute</th><th width="170.4000244140625">OvalEdge Attribute</th><th width="176.7999267578125">OvalEdge Category</th><th width="163.199951171875">OvalEdge Type</th></tr></thead><tbody><tr><td>File/Folder</td><td>Folder</td><td>Folder</td><td>Folder</td><td>Folder</td></tr><tr><td>File</td><td>File</td><td>File</td><td>File</td><td>-</td></tr><tr><td>File</td><td>XLSX</td><td>Folder(subfile)</td><td>Folder(subfile)</td><td>Folder(subfile)</td></tr><tr><td>File</td><td>XLS</td><td>Folder(subfile)</td><td>Folder(subfile)</td><td>Folder(subfile)</td></tr><tr><td>File</td><td>CSV</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>TXT</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>PARQUET</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>ORC</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>JSON</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>YAML</td><td>File</td><td>File</td><td>File</td></tr><tr><td>File</td><td>PIP</td><td>File</td><td>File</td><td>File</td></tr></tbody></table>

### Set up a Connection

#### Prerequisites

The following are the prerequisites to establish a connection:

Ensure that the CSV files follow the required formatting standards for proper data processing and visibility. Refer to [CSV Format Requirements](https://docs.ovaledge.com/release8.1/connectors/additional-requirements/csv-format-requirements-for-file-connectors)**.**

**Service Account User Permissions**

{% hint style="warning" %}
It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.
{% endhint %}

{% hint style="info" %}
👨‍💻 Who can provide these permissions? These permissions are typically granted by the Azure Data Lake administrator, as users may not have the required access to assign them independently.
{% endhint %}

| Operation            | Objects          | Access Permission |
| -------------------- | ---------------- | ----------------- |
| Connector Validation | Containers       | Read              |
| Crawling             | Containers       | Read              |
| Crawling & Profiling | Buckets          | Read              |
| Crawling & Profiling | Folder           | Read              |
| Crawling & Profiling | Files            | Read              |
| View Data            | profile/Get Data | Read              |

{% hint style="info" %}
**Required Permissions**

* Ensure the following Azure permissions are assigned:
  * Microsoft.Storage/storageAccounts/blobServices/containers/read
  * Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
* If ACLs are enabled in ADLS Gen2, also configure the following access controls:
  * Folder access (traverse/list): x, r-x&#x20;
  * File access (read): r–
* These permissions are required for successful access and operations.
  {% endhint %}

#### Connection Configuration Steps

{% hint style="info" %}
Users are required to have the Connector Creator role in order to configure a new connection.
{% endhint %}

1. Log into OvalEdge, go to **Administration > Connectors**, click **+ (New Connector),** search for **Azure Data Lake**, and complete the required parameters.

{% hint style="info" %}
Fields marked with an asterisk (\*) are mandatory for establishing a connection.
{% endhint %}

<table><thead><tr><th width="219.800048828125">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Connector Type</td><td>By default, "Azure Data Lake" is displayed as the selected connector type.</td></tr><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on your selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp Vault</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add-ons</td><td>Select the checkbox for Data Quality Add-On to identify data quality issues using data anomaly detection.</td></tr><tr><td>Connector Environment</td><td>Select the environment (<strong>Example</strong>: PROD, STG) configured for the connector.</td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Azure Data Lake connection              </p><p>(<strong>Example</strong>: "Azure_Data_Lake").</p></td></tr><tr><td>Connector description</td><td>Enter a brief description of the connector.</td></tr><tr><td>Authentication Type*</td><td><p>The following two types of authentication are supported</p><p> for Azure Data Lake:</p><ul><li>ADL String</li><li>ADL Service Principal</li></ul></td></tr><tr><td>Client Id*</td><td><p>Enter the Client ID (Application ID), which uniquely identifies the registered application.</p><p><strong>Note:</strong> The field will appear only if the authentication is selected as “ADL Service Principal”.</p></td></tr><tr><td>Client Secret*</td><td><p>Enter the Client Secret, which is used by the application to authenticate and request tokens.</p><p><strong>Note:</strong> The field will appear only if the authentication is selected as “ADL Service Principal”.</p></td></tr><tr><td>Tenant Id*</td><td><p>Provide the Tenant ID (Directory ID) that identifies the Azure Active Directory instance used for authentication.</p><p><strong>Note:</strong> The field will appear only if the authentication is selected as “ADL Service Principal”.</p></td></tr><tr><td>ADL Endpoint*</td><td><p>Provide the URL used to interact with ADL storage accounts.</p><p><strong>Note:</strong> The field will appear only if the authentication is selected as “ADL Service Principal”.</p></td></tr><tr><td>ADL Connection String*</td><td><p>Enter the ADL connection string that was generated at the Azure storage account.</p><p><strong>Note:</strong> The field will appear only if the authentication is selected as “ADL String”.</p></td></tr></tbody></table>

| **Default Governance Roles** |                                                                                                                                                                                                                                                                                                                                      |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Default Governance Roles\*   | Select the appropriate users or teams for each governance role from the drop-down list. All users and teams configured in OvalEdge Security are displayed for selection.                                                                                                                                                             |
| **Admin Roles**              |                                                                                                                                                                                                                                                                                                                                      |
| Admin Roles\*                | Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured in OvalEdge Security are available for selection.                                                                                                                                                        |
| **No of Archive Objects**    |                                                                                                                                                                                                                                                                                                                                      |
| No Of Archive Objects\*      | <p>This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.</p><p>Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.</p> |
| **Bridge**                   |                                                                                                                                                                                                                                                                                                                                      |
| Select Bridge\*              | <p>If applicable, select the bridge from the drop-down list.</p><p>The drop-down list displays all active bridges configured in OvalEdge. These bridges enable communication between data sources and OvalEdge without altering firewall rules.</p>                                                                                  |

2. After entering all connection details, the following actions can be performed:
   1. Click **Validate** to verify the connection.
   2. Click **Save** to store the connection for future use.
   3. Click **Save & Configure** to apply additional settings before saving.
3. The saved connection will appear on the Connectors home page.

### Manage Connector Operations

#### Crawl/Profile

{% hint style="info" %}
To perform crawl operations, users must be assigned the Integration Admin role.
{% endhint %}

1. Navigate to the **Connectors** page and click **Crawl/Profile.**
2. This action initiates the metadata collection process from the data source and loads the retrieved metadata into the **File Manager > File Explorer.**
3. In the File Manager, click the connector name, select the specific **folder(s) or file(s)**, then click **Catalog / Catalog and Profile** from the **Nine Dots** menu. For more details, click [here](https://docs.ovaledge.com/release8.1/file-manager/file-explorer).
4. The selected files or folders will be added to the **Data Catalog > Files/File Columns** tab.

#### Other Operations

The **Connectors** page in OvalEdge provides a centralized view of all configured connectors, including their health status.

**Managing connectors includes:**

* **Connectors Health:** Displays the current status of each connector using a green icon for active connections and a red icon for inactive connections, helping to monitor the connectivity with data sources.
* **Viewing:** Click the Eye icon next to the connector name to view connector details, including databases, tables, columns, and codes.

**Nine Dots Menu Options:**

To view, edit, validate, configure, or delete connectors, click on the Nine Dots menu.

* **Edit Connector:** Update and revalidate the data source.
* **Validate Connector:** Check the connection's integrity.
* **Settings:** Modify connector settings.
  * **Crawler:** Configure data extraction.
  * **Access Instructions:** Add notes on how data can be accessed.
  * **Business Glossary Settings:** Manage term associations at the connector level.
  * **Anomaly Detection Settings**: Configure anomaly detection preferences at the connector level.
* **Delete Connector:** Remove a connector with confirmation.

#### Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

#### Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

<table data-header-hidden><thead><tr><th width="87.20001220703125">S.No.</th><th width="271.800048828125">Error Message(s)</th><th>Error Description &#x26; Resolution</th></tr></thead><tbody><tr><td>1</td><td>Connection Validation Failure</td><td><p><strong>Error Description:</strong></p><ul><li>Connection validation fails due to incorrect credentials, an invalid connection string, missing containers, or network issues.</li></ul><p><strong>Resolution:</strong></p><ul><li>Verify connection string format (must start with DefaultEndpointsProtocol=https)</li><li>Check Client ID, Client Secret, and Tenant ID</li><li>Ensure the storage account exists</li><li>Confirm at least one container is available</li><li>Validate network connectivity</li></ul></td></tr><tr><td>2</td><td>Invalid Client Secret / Authentication Failure</td><td><p><strong>Error Description:</strong></p><p>Authentication fails when the client secret is incorrect, expired, or mismatched.</p><p><strong>Resolution:</strong></p><ul><li>Verify Client Secret value (copy correctly from Azure Portal)</li><li>Check if the secret has expired</li><li>Generate a new secret if required</li><li>Ensure Client ID and Tenant ID are correct</li></ul></td></tr><tr><td>3</td><td>Authorization Failure (Access Denied)</td><td><p><strong>Error Description:</strong></p><p>The service principal does not have the required permissions to access the storage account.</p><p><strong>Resolution:</strong></p><ul><li>Assign required roles:</li><li>Ensure the role is assigned at the storage account level</li><li>Verify access in the Azure Portal</li></ul></td></tr><tr><td>4</td><td>Resource Not Found (File / Container / Path)</td><td><p><strong>Error Description:</strong></p><p>The file, container, or path does not exist or is incorrectly specified.</p><p><strong>Resolution:</strong></p><ul><li>Verify the resource exists in the Azure Portal</li><li>Check the container name and file path</li><li>Use correct format: container/folder/file</li><li>Ensure no extra or missing slashes</li></ul></td></tr><tr><td>5</td><td>No Containers Found During Validation</td><td><p><strong>Error Description:</strong></p><p>Validation fails when the storage account has no containers or permissions to list containers are missing.</p><p><strong>Resolution:</strong></p><ul><li>Create at least one container in the storage account</li><li>Verify permissions to list containers</li><li>Confirm access using Azure Portal</li></ul></td></tr><tr><td>6</td><td>No files found in the container</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when container is empty, path is incorrect, or permissions are missing.</li></ul><p><strong>Resolution:</strong></p><ul><li>Verify files exist in container</li><li>Check folder path</li><li>Confirm permissions</li><li>Validate container name</li></ul></td></tr><tr><td>7</td><td>Container listing failed</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs due to missing permissions, invalid storage account, or network issues.</li></ul><p><strong>Resolution:</strong></p><ul><li>Assign role: Storage Blob Data Reader</li><li>Verify storage account</li><li>Check network connectivity</li><li>Validate authentication</li></ul></td></tr><tr><td>8</td><td>File download failed</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when file does not exist, permissions are missing, or network issues occur.</li></ul><p><strong>Resolution:</strong></p><ul><li>Verify file exists</li><li>Ensure download permissions</li><li>Check network stability</li><li>Validate encryption access if applicable</li></ul></td></tr><tr><td>9</td><td>Access Denied</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when Client ID, Secret, or Tenant ID is incorrect, expired, or permissions are missing.</li></ul><p><strong>Resolution:</strong></p><ul><li>Verify Client ID, Client Secret, Tenant ID</li><li>Check if the secret is expired</li><li>Ensure service principal is active</li><li>Assign roles:</li></ul></td></tr><tr><td>10</td><td>Authentication / Token error</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when credentials are invalid, expired, or Azure AD authentication fails.</li></ul><p><strong>Resolution:</strong></p><ul><li>Verify Client ID, Secret, Tenant ID</li><li>Check secret expiry</li><li>Ensure Tenant ID is correct</li><li>Validate access to: login.microsoftonline.com</li></ul></td></tr><tr><td>11</td><td>Slow file listing</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when containers have large number of files or high network latency.</li></ul><p><strong>Resolution:</strong></p><ul><li>Wait for pagination (batch loading)</li><li>Use specific folder paths</li><li>Check network speed</li></ul></td></tr><tr><td>12</td><td>Operation timeout</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs due to slow network, large data volume, or low timeout settings.</li></ul><p><strong>Resolution:</strong></p><ul><li>Check the internet connection</li><li>Retry operation</li><li>Perform during off-peak hours</li><li>Increase timeout settings if possible</li></ul></td></tr><tr><td>13</td><td>CSV column names not detected</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when the file lacks a header, uses an incorrect delimiter, or has encoding issues.</li></ul><p><strong>Resolution:</strong></p><ul><li>Ensure the header row exists</li><li>Use standard delimiters (comma, tab, semicolon)</li><li>Save file in UTF-8</li><li>Avoid special characters in column names</li></ul></td></tr><tr><td>14</td><td>Excel file cannot be read</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs when the file is corrupted, unsupported, password-protected, or too large.</li></ul><p><strong>Resolution:</strong></p><ul><li>Use .xls or .xlsx format</li><li>Remove password protection</li><li>Verify file opens in Excel</li><li>Reduce file size if needed</li></ul></td></tr><tr><td>15</td><td>Data type detection failed</td><td><p><strong>Error Description:</strong></p><ul><li>Occurs due to insufficient data, mixed data types, or inconsistent formats.</li></ul><p><strong>Resolution:</strong></p><ul><li>Provide at least 20–30 rows</li><li>Maintain consistent data types</li><li>Use standard date formats (YYYY-MM-DD)</li></ul></td></tr></tbody></table>

#### FAQs

<details>

<summary>Why does connection string authentication fail?</summary>

It fails when the connection string format is incorrect, the account key is wrong, or the storage account does not exist. Check that the string starts with DefaultEndpointsProtocol=https, verify AccountName and AccountKey, and confirm the storage account exists.

</details>

<details>

<summary>Why are folders not showing correctly?</summary>

Folders do not appear when the path format is incorrect or when the results are still loading. Verify the path structure and wait for all results to load.

</details>

<details>

<summary>Why is file download slow or timing out?</summary>

Downloads are slow due to large files, a slow network, or regional differences. Check your network speed, try during off-peak hours, or increase timeout settings.

</details>

<details>

<summary>Why can’t I see file details?</summary>

You cannot see file details if permissions are missing or the file does not exist. Verify file existence, check permissions, and refresh the connection.

</details>

<details>

<summary>Why can’t I upload files?</summary>

Upload fails when permissions are missing, the path is invalid, or the network is unstable. Verify upload permissions, container existence, and path format.

</details>

<details>

<summary>What role is required for file operations?</summary>

Use Storage Blob Data Reader for read operations and Storage Blob Data Contributor for upload or write operations. Assign roles at the storage account level.

</details>

<details>

<summary>Why do I see connection or client errors?</summary>

These errors occur due to network issues, SSL problems, or incorrect proxy settings. Check your network, system time, and proxy configuration, then retry.

</details>

<details>

<summary>Why does the setup fail or the client not initialize?</summary>

Setup fails when the authentication or connection setup does not complete. Verify credentials, storage account, and network connectivity, then retry.

</details>

<details>

<summary>Why do operations timeout?</summary>

Operations timeout due to slow network or large data responses. Retry the operation or improve network speed and timeout settings.

</details>

<details>

<summary>Why are CSV column names not detected?</summary>

This happens when the file has no header row or uses an incorrect delimiter or encoding. Add a header row and use standard delimiters with UTF-8 encoding.

</details>

<details>

<summary>Why can’t the system read Excel files?</summary>

This happens when the file is corrupted, unsupported, or too large. Use .xls or .xlsx format, remove protection, and reduce file size if needed.

</details>

<details>

<summary>Why are data types detected incorrectly?</summary>

This happens when data is inconsistent, insufficient, or incorrectly formatted. Use consistent data types and standard formats, and include enough sample rows.

</details>

<details>

<summary>What is the difference between a Connection String and a Service Principal?</summary>

A connection string uses account keys and is simple to set up. A service principal uses Azure AD and is more secure. Use the connection string for testing and the service principal for production.

</details>

<details>

<summary>Why is the endpoint URL important?</summary>

It tells the system which storage account to connect to. Use the format: <https://.blob.core.windows.net>

</details>

<details>

<summary>Why must the connection string follow a specific format?</summary>

The system cannot parse or authenticate an incorrect format. Always use the exact format from the Azure Portal.

</details>

<details>

<summary>Why is Tenant ID required?</summary>

Tenant ID identifies your Azure AD directory for authentication. Copy it from the Azure Portal.

</details>

<details>

<summary>Why do operations fail when using a proxy?</summary>

Failures occur when proxy settings are incorrect or do not support HTTPS. Verify proxy configuration and authentication settings.

</details>

<details>

<summary>Why can’t I connect to Azure?</summary>

Connection fails due to firewall, DNS, or network restrictions. Allow outbound HTTPS (port 443) and verify DNS and network rules.

</details>

<details>

<summary>Why is file summary not available?</summary>

This happens when the file does not exist or when permissions are missing. Verify file existence and permissions.

</details>

<details>

<summary>Why are file paths not recognized?</summary>

This happens when the path format is incorrect or contains special characters. Use the format: container/folder/file.

</details>

<details>

<summary>Why does folder validation fail?</summary>

It fails when the path is incorrect or no files exist in that folder. Verify the path and ensure files exist.

</details>

<details>

<summary>Why does pagination fail?</summary>

Pagination fails due to large data volume or timeout issues. Load data in smaller batches or use specific paths.

</details>

<details>

<summary>Why does the temporary download fail?</summary>

It fails when disk space is low or permissions are missing. Ensure enough disk space and write permissions.

</details>

<details>

<summary>Why do I see “No containers found”?</summary>

This happens when the storage account has no containers or access is restricted. Create a container and verify permissions.

</details>

***

Copyright © 2026, OvalEdge LLC, Peachtree Corners, GA, USA.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ovaledge.com/release8.1/connectors/connector-repositories/cloud-storage/azure-data-lake.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
