Azure Event Hub
Azure Event Hub is one type of message broker available on Azure. It is useful for ingesting high volumes of messages reliably. It provides support for crawling streams and profiling of messages.

Connector Capabilities
Technical Specification
A technical specification for the Azure Event Hub connector contains information about the Crawler, Profiler, and Query Execution and also information about supported objects, supported data types, and user permissions.
Crawler
Tables
Topics in EventHub will be represented as tables
Table columns
Messages in EventHub will be represented as Columns, and they will be crawled while Sample profiling.
Profiling
Table Profiling
Row count, Columns count, View sample data
Views Profiling
Not Supported
Column Profiling
Min, Max, Null count, distinct, top 50 values
Full Profiling
Not Supported
Sample Profiling
Supported
Note: Lineage and Query are not supported.
Connection Details
To use the EventHub connector, the details specified in the following section should be available.
Prerequisites
The APIs/ drivers used by the connector are given below:
Driver(s)
-
No external driver Required
Internal jars
5.3.0
org.apache.kafka.clients
Internal jars
5.3.0
io.confluent.kafka-schema-registry-client
Service Account User
-
Service Account user with Read Privileges
User Permission
By default, the service account provided for the connector will be used for any query operations. If the service account with write privileges, then Insert/Update/Delete queries can be executed.
The minimum privileges required are:
Connection Validation
SELECT, and USAGE
Crawling
Select, Usage, Reference, and Execution
Profiling
No permission is required to profile
Note: Navigate to Configuration > Users & Roles for roles and permission.
Add Connection
To connect to the EventHub using the OvalEdge application, complete the following steps.
Login to the OvalEdge application
Navigate to Administration > Connectors module.
To add a new connection, click on the +AddNewConnector icon. A manage connection pop-up is displayed to select a connector.
Click on the +AddNewConnector icon, and the Manage Connection with Search Connector pop-up window is displayed. Select the connection type as EventHub. The Manage Connection with EventHub specific details pop-up window is displayed.
The following are the field attributes required for the connection.
Connection Type
Mandatory
Select Amazon Redshift connector.
By default, the selected connection type is displayed as Amazon Redshift. If required, the connection type can be changed, and depending on the connector selected, fields are displayed accordingly.
License Type
Mandatory
By default License type is Auto Lineage.
License Type:
In a license type, the permissions are specified based on the customer's requirements. The user has the option to select the license type as Standard or Auto Lineage. The connector license is categorized into
(i) Standard: The standard connectors may not have Auto Lineage functionality. It will not build the lineage for the selected database.
(ii) Auto Lineage: Additionally, auto lineage connectors have Auto Lineage functionality. It will build the lineage for the selected database.
See, License Types for more information.
Connection Name
Mandatory
Select a Connection name for the EventHub database. The name that you specify is a reference name to identify your EventHub database connection in OvalEdge. Example: EventHub Connection1
Broken URL*
Mandatory
Database instance URL (on-premises/cloud-based) Example: oval-EventHub.csklygkwz3dx.us-east-1.rds.amazonaws.com
Cluster Name *
Mandatory
(Default)
Consumer Group
Optional
Provide the Consumer Group
JAAS Config Path*
Mandatory
Provide the path for EventHub secret key for validation
Registry URL
Optional
Enter the Registry URL
Directory ID (Tenant ID)
Optional
It is optional to provide the Directory Id
Application ID (Client ID)
Optional
It is optional to provide the Directory Id
Client Secret
Optional
It is optional to enter the Client Secret
Default Governance Roles
Mandatory
From the dropdown list, select Stewards, Custodian, and Owner.
Once after entering the connection details in the required fields.
Click on the Save button or Save & Configure to establish and configure the connection settings. When you click the Save & Configure button, the Connection Settings pop-up window is displayed, where you can configure the connection settings for the selected Connector.
Note: The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Click on the Validate button the entered connection details are validated.
Note: It is up to the user's choice, you can save the connections details first, or you can validate the connection and then save it.
Error Validation Details
The following are the possible error messages encountered during the validation.
1
/volume/OE_DATA/Connections/EventHub/con-config(No Such File or Directory)
Incorrect JAAS path
2
Invalid credentials passed!
Incorrect password
Connection Settings
Once connectivity is established, additional configurations for crawling and profiling can be specified:
Crawler
Crawler has various settings tabs for crawling and profiling options. The crawler options are available for all the connections. Based on the connection selected, the options will differ. You need to provide the mandatory options for the crawler setting one of them is mandatory.
Crawler options
Tables and Columns: This crawling will discover the tables and Columns and bring them into OvalEdge. This is the Default option for crawling.
Crawler Rule
In the Crawler Rules, when setting up the regex rules, the user will be able to write rules that will either include and/or exclude schema, tables, views, columns, procedures, and functions that start with, end with, or have middle characters as defined in the rule.
Profiler
Profiling a data source also helps identify relationships between the tables at the entity level and patterns between them. Many attributes can be specified in the profile settings. Once the setting for profiling has been configured, go back to the Crawler screen and click “Crawl/Profile” to begin the profiling.
Note: Profiling will be successful if “All” and “Current Day” are matched.
The attributes are as follows:
Order
Order number is a sequence in which the profiling is done.
Day
The day of a week in which profiling is set to run.
Start/End Time
The start and end time which profiling is set to perform.
Number of Threads
A thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries that are executed on the data source.
Profile Type
There are four main types of data profiling.
Sample - For Sample profiling, The profiling is performed based on a given Sample Profile Size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different compared to full profiles as we calculate them only on sample size. To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled).
Auto - For Auto profiling,
If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (100) is less than the total Table Row Count (1000) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit. Note: A profile type set to “Auto” will always depend on the Rowcount limit when the row count constraint must be set as “True”.
Query - For Query profiling,
If the entered table row count is less than the Rowcount Limit, then the profiling is executed on the entire table.
If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues.
Disabled profile type prevents profiling on the selected data source
Row Count Constraint
The Row Count Constraint option is applicable only when the Profile Type is selected as Auto.
If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (100) is less than the total Table Row Count (1000) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit.
Row Count Limit
Enter the maximum number of rows that should be considered for profiling.
Sample Data Count
Enter the total number of rows to see within the table data page in the Catalog.
Sample Profile Size
Select this option for considering the total number of rows in profiling.
Query Timeout
Enter the number of seconds for the query to run on the remote database before it times out.
Access Instruction
It allows the admin user to write the instructions and guide the business user to crawl the data source. Instructions are nothing, but some related information about the data source connection can be provided here. It could have the information resources like links, Images, or Videos that help business users of that particular data source connection.
For example: When an admin user saves the access instructions and crawls the schema, the saved instruction will appear in the data catalog after clicking on the schema. It's just providing info or guidelines on a specific schema.
You can provide the instruction in Connectors > Setting page
Click on the Access Instruction tab
Enter the instructions
Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like a database.
Other
When you navigate to the Others tab, the Send Metadata Changes Notifications to and Context URL sections are displayed.
Send Metadata Notification to:
Select whether the notifications for the Data Owner and Data Steward under the specific Roles need to be sent or not.
Select the desired role from the Roles dropdown menu and click on the Save Changes button. The notifications will be successfully sent to select Data Owner and Data Steward.
Context URL:
Enter the browser URL for the selected data source.
Example: https://azure.microsoft.com/en-in/services/event-hubs/#overview is entered in for the tableau connection.
Note: To add multiple URLs, click on the + Add New URL option. The new textbox to provide the URL details is displayed in the Context URL section.
Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
Was this helpful?

