Apache Impala
This article describes the integration with the Apache Impala connector, enabling streamlined metadata management through crawling, profiling, data preview, and manual lineage building, while ensuring secure authentication via Credential Manager.

Overview
Connector Details
Connector Category
RDBMS
Connector Version
Release6.3.4
Releases Supported (Available from)
Legacy Connector
Connectivity
[How the connection is established with Apache Impala]
JDBC
Verified Apache Impala Version
2.5.31
Connector Features
Crawling
✅
Delta Crawling
❌
Profiling
✅
Query Sheet
✅
Data Preview
✅
Auto Lineage
❌
Manual Lineage
✅
Secure Authentication via Credential Manager
✅
Data Quality
❌
DAM (Data Access Management)
❌
Bridge
✅
Metadata Mapping
The following objects are crawled from the Apache Impala and mapped to the corresponding UI assets.
Schema
Schema name
Schema
Databases
Schema
Table
Table Name
Table
Tables
Table
Table
Table Type
Type
Tables
Table
Table
Table Comments
Source Description
Descriptions
Source Description
Columns
Column Name
Column
Table Columns
Columns
Columns
Data Type
Column Type
Table Columns
Columns
Columns
Description
Source Description
Table Columns
Columns
Views
View Name
View
Tables
Views
Set up a Connection
Prerequisites
The prerequisites to establish a connection:
Whitelisting Ports
Whitelist the inbound port 21050 to allow OvalEdge to connect to the Apache Impala Server database.
Apache Impala uses port 21050 by default. When a different port is configured, specify the updated port number during connection setup, whitelist the port, and establish proper communication between the system and the Apache Impala Server.
Service Account User Permissions
Use a dedicated service account to establish the connection to the data source, configured with the following minimum set of permissions.
Crawling & Profiling
Schema
USAGE on the database
USAGE
Crawling & Profiling
Tables
USAGE on the database
SELECT privilege on tables
SELECT & USAGE
Crawling & Profiling
Table Columns
USAGE on the database
SELECT privilege on tables
SELECT & USAGE
Crawling & Profiling
Primary Keys (PK) and Foreign Keys (FK)
USAGE on the database
SELECT privilege on tables
SELECT & USAGE
Connection Configuration Steps
Users must have the Connector Creator role to configure a new connection.
Log in to OvalEdge, go to Administration > Connectors, click + (New Connector), search for Impala, and complete the required parameters.
Connector Type
By default, "Apache Impala" is displayed as the selected connector type.
Authentication*
The following two types of authentication are supported for Apache Impala Server:
Kerberos Authentication
Non-Kerberos Authentication
Credential Manager*
Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.
Supported Credential Managers:
Database
HashiCorp
AWS Secrets Manager
Azure Key Vault
Connector Name*
Enter a unique name for the Apache Impala connection
(Example: "Apache Impala_Prod").
Connector Environment
Select the environment (Example: PROD, STG) configured for the connector.
Connector Description
Enter the description related to the connector.
Server*
Enter the IP address of the server where Apache Impala is hosted.
Port*
Apache Impala uses port 21050 by default. The port number can be modified as needed.
Database*
The ‘Database’ field specifies the default schema to connect to within the Impala server.
Example: If the target database is sales_db, enter sales_db to connect directly instead of the default schema.
Driver*
By default, Apache Impala uses ‘com.cloudera.impala.jdbc41.Driver.’ This field is not editable.
Principal
The Principal field specifies the Kerberos principal that the client will use to authenticate to the Impala service. It identifies the service account for Impala in the Kerberos realm.
Connection String
Configure the connection string for the Impala server:
Automatic Mode: The system generates a connection string based on the provided credentials.
Manual Mode: Enter a valid connection string manually.
Replace placeholders with actual server details:
{server} refers to the Impala host or IP address.
{sid} refers to the database name (schema).
Authentication Plugins: jdbc:hive2://{server}:2xxx/{sid};principal=impala/undefined
This is the default authentication string used for connecting to Impala. The principal parameter specifies the Impala service principal for authentication.
Keytab
The Keytab field specifies the path to the Kerberos keytab file containing the principal’s credentials. It is used to securely authenticate the client to the Impala server without manual password entry.
Krb5-Configuration File*
The Krb5-Configuration File field specifies the path to the krb5.conf file used for Kerberos authentication. It provides the necessary Kerberos realm and KDC information for Impala to validate the user's credentials.
Plugin Server
Enter the server name when running as a plugin server.
Plugin Port
Enter the port number on which the plugin is running.
Credential Manager*
Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.
Supported Credential Managers:
Database
HashiCorp
AWS Secrets Manager
Azure Key Vault
Connector Name*
Enter a unique name for the Apache Impala connection
(Example: "Apache Impala_Prod").
Connector Environment
Select the environment (Example: PROD, STG) configured for the connector.
Connector Description
Enter the description related to the connector.
Server*
Enter the IP address of the server where Apache Impala is hosted.
Port*
Apache Impala uses port 21050 by default. The port number can be modified as needed.
Database*
The ‘Database’ field specifies the default schema to connect to within the Impala server.
Example: If the target database is sales_db, enter sales_db to connect directly instead of the default schema.
Driver*
By default, Apache Impala uses ‘com.cloudera.impala.jdbc41.Driver.’ This field is not editable.
Username*
The username field specifies the user account used to connect to the Impala server.
Password*
The Password field should contain the user’s password associated with the provided username. It is used to authenticate the connection when establishing a session with the Impala server.
Connection String
Configure the connection string for the Impala server:
Automatic Mode: The system generates a connection string based on the provided credentials.
Manual Mode: Enter a valid connection string manually.
Replace placeholders with actual server details:
{server} refers to the Impala host or IP address.
{sid} refers to the database name (schema).
Authentication Plugins: jdbc:hive2://{server}:2xxx/{sid};principal=impala/undefined
This is the default authentication string used for connecting to Impala. The principal parameter specifies the Impala service principal for authentication.
Plugin Server
Enter the server name when running as a plugin server.
Plugin Port
Enter the port number on which the plugin is running.
Default Governance Roles
Default Governance Roles*
Select the appropriate users or teams for each governance role from the drop-down list. All users and teams configured in OvalEdge Security are displayed for selection.
Admin Roles
Admin Roles*
Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured in OvalEdge Security are available for selection.
No of Archive Objects
No Of Archive Objects*
This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.
Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.
Bridge
Select Bridge*
If applicable, select the bridge from the drop-down list.
The drop-down list displays all active bridges configured in OvalEdge. These bridges enable communication between data sources and OvalEdge without altering firewall rules.
After entering all connection details, the following actions can be performed:
Click Validate to verify the connection.
Click Save to store the connection for future use.
Click Save & Configure to apply additional settings before saving.
The saved connection will appear on the Connectors home page.
Manage Connector Operations
Crawl/Profile
To perform crawl and profile operations, users must be assigned the Integration Admin role.
The Crawl/Profile button allows users to select one or more schemas for crawling and profiling.
Navigate to the Connectors page and click Crawl/Profile.
Select the schemas to crawl.
The Crawl option is selected by default. Click the Crawl & Profile radio button to enable both operations.
Click Run to collect metadata from the connected source and load it into the Data Catalog.
After a successful crawl, the information appears in the Data Catalog > Databases tab.
The Schedule checkbox allows automated crawling and profiling at defined intervals, from a minute to a year.
Click the Schedule checkbox to enable the Select Period drop-down.
Select a time period for the operation from the drop-down menu.
Click Schedule to initiate metadata collection from the connected source.
The system will automatically execute the selected operation (Crawl or Crawl & Profile) at the scheduled time.
Other Operations
The Connectors page in OvalEdge provides a centralized view of all configured connectors, including their health status.
Managing connectors includes:
Connectors Health: Displays the current status of each connector, with a green icon for active connections and a red icon for inactive connections, helping monitor connectivity to data sources.
Viewing: Click the Eye icon next to the connector name to view connector details, including Tables, Views, and Columns.
Nine Dots Menu Options:
To view, edit, validate, configure, or delete connectors, click on the Nine Dots menu.
Edit Connector: Update and revalidate the data source.
Validate Connector: Check the integrity of the connection.
Settings: Modify connector settings.
Crawler: Configure data extraction.
Profiler: Customize data profiling rules and methods.
Query Policies: Define query execution rules based on roles.
Access Instructions: Add notes on how data can be accessed.
Business Glossary Settings: Manage term associations at the connector level.
Connection Pooling: Allows configuring parameters such as maximum pool size, idle time, and timeouts directly within the application.
Others: Configure notification recipients for metadata changes.
Delete Connector: Remove a connector with confirmation.
Connectivity Troubleshooting
If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.
1
Handler dispatch failed: java.lang.NoSuchFieldError: DEFAULT_MAX_WAIT
Description: This error occurs when the Impala connector (or a dependent library, such as the JDBC/ODBC driver or Hadoop/Hive libraries) tries to access a field named DEFAULT_MAX_WAIT that does not exist in the loaded version of the class.
Resolution: Ensure that the Impala JDBC/ODBC driver version matches the Impala server version. Confirm that any Hadoop/Hive libraries on the classpath are compatible with the connector.
Copyright © 2025, OvalEdge LLC, Peachtree Corners, GA, USA.
Last updated
Was this helpful?

