Azure Data Lake
Azure Data Lake is extensively scalable and secure storage that performs all types of processing and analytics across platforms. It can store structured, semi-structured, and unstructured data seamlessly.
In the OvalEdge application, the Azure Data Lake connector allows you to crawl and sample profile the files or folders existing in the Azure Data Lake instance.
Prerequisites
The following are prerequisites for connecting to the Azure Data Lake.
The APIs/ drivers used by the connector are given below:
Sl.No
Driver / API
Details
1
API
The connectivity to Azure Data Lake is via ADL, a common library included in the platform.
Server User Permission
By default, the service account provided for the connector will be used for any user operations. The minimum privileges required are:
Operation
Access Permission
Connection Validation
Read
Crawl File/Folders
Read
Catalog Files/Folders
Read
Profile Files/Folders
Read
Technical Specification
The connector capabilities are shown below:
Crawling
Feature
Supported Objects
Remarks
Crawling
Data Storage Containers
While crawling root Files/Folders, by default all the folder and files existing in that specific root path will be cataloged
Profiling
Features
Supported Objects
Details
File Profiling
Row Count, Columns Count, View Sample Data
Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC
Sample Profiling
Supported
-
Connection Details
To connect to the Azure Data Lake using the OvalEdge application, complete the following steps:
Log in to the OvalEdge application.
Navigate to Administration > Connectors module.
Click on the + icon, and the Add Connection with Search Connector pop-up window is displayed.
Select the connection type as Azure Data Lake. The Add Connector with Azure Data Lake specific details pop-up window is displayed.
Field Name
Description
Connector Type
By default, the selected connection type is displayed as the Azure Data Lake.
Credential Manager
Select the option from the drop-down menu, where you want to save your credentials:
OE Credential Manager: Azure Data Lake connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Azure Data Lake database. Users need to add the credentials manually if the OE Credential Manager option is selected.
HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.
AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.
For more information Azure Key Vault, refer to Azure Key Vault
For more information on Credential Manager, refer to Credential Manager
License Add Ons
All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource.
OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature.
Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
Select the Data Access Add-On license that will enforce connector access via OvalEdge with Remote Data Access Management (RDAM) feature-enabled.
Connector Environment
The environment drop-down menu allows you to select the environment configured for the connector from the drop-down list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).
The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).
Note: The steps to set up environment variables are explained in the prerequisite section.
Connector Name*
Enter the connector name specified in the Connector Name text box. It will be a reference to the Azure Data Lake database connection in the OvalEdge application.
Authentication Type
The Authentication Type drop-down list allows you to select either ADL String or ADL Service Principal.
ADL String:
ADL Connection String* : Enter the connection string which was generated at the Azure storage account. Ex:DefaultEndpointsProtocol=https;AccountName=ovaledgefileaccess;AccountKey=...
ADL Service Principal:
Client Id*: After you've registered your application, you'll see the application ID (or client ID) under
Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
Click Enterprise applications.
Click All applications.
Select the application which you have created.
Click Properties.
Copy the Application ID.
Client Secret*: The application needs a client secret to prove its identity when requesting a token. For security reasons, Microsoft limits the creation of client secrets longer than 24 months and strongly recommends that you set this to a value less than 12 months.
Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
Click App registrations.
Select the application which you have created.
Click on All settings.
Click on Keys.
Type Key description and select the Duration.
Click Save.
Copy and store the key value. You won't be able to retrieve it after you leave this page.
Tenant Id*: The tenant ID identifies the Microsoft Entra ID (previously, Azure Active Directory) tenant to use for authentication. It is also referred to as the directory ID
Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
Click properties.
Copy the directory ID.
ADL Endpoint*: URL used to interact with ADL storage accounts.
Default Governance Roles*
Users can select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset.
Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section.
Admin Roles*
Select the required admin roles for this connector.
To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button. The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button. The security and Governance Admin is responsible for:
Configure role permissions for the connector and its associated data objects.
Add admins to set permissions for roles on the connector and its associated data objects.
Update governance roles.
Create custom fields.
Develop Service Request templates for the connector.
Create Approval workflows for the templates.
No of Archive Objects*
The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive.
Select Bridge
Select option NO Bridge if no bridge is available for the connector.

Connection Settings
Crawler
Sl.No
Property
Description
1
Crawler Options
FileFolders/Buckets by default enabled
2
Crawler Rules
Include and exclude regex for FileFolders and Buckets only but not for files
Profiler
Sl.No
Property
Description
1
Profile Options
No Existence for Profile
2
Profile Rules
No Profile rule exist
Points to note:
Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC.
Only shows the details of File/Folder in FileManager which user has access to Files/FIleFolder.
Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
Was this helpful?

