Apache Impala

This article describes the integration with the Apache Impala connector, enabling streamlined metadata management through crawling, profiling, data preview, and manual lineage building, while ensuring secure authentication via Credential Manager.

Overview

Connector Details

Connector Category

RDBMS

Connector Version

Release6.3.4

Releases Supported (Available from)

Legacy Connector

Connectivity

[How the connection is established with Apache Impala]

JDBC

Verified Apache Impala Version

2.5.31

Connector Features

Feature

Availability

Crawling

✅

Delta Crawling

❌

Profiling

✅

Query Sheet

✅

Data Preview

✅

Auto Lineage

❌

Manual Lineage

✅

Secure Authentication via Credential Manager

✅

Data Quality

❌

DAM (Data Access Management)

❌

Bridge

✅

Metadata Mapping

The following objects are crawled from the Apache Impala and mapped to the corresponding UI assets.

Apache Impala Object

Apache Impala Attribute

OvalEdge Attribute

OvalEdge Category

OvalEdge Type

Schema

Schema name

Schema

Databases

Schema

Table

Table Name

Table

Tables

Table

Table Type

Type

Tables

Table

Table Comments

Source Description

Descriptions

Source Description

Columns

Column Name

Column

Table Columns

Columns

Data Type

Column Type

Table Columns

Columns

Description

Source Description

Table Columns

Columns

Views

View Name

View

Tables

Views

Set up a Connection

Prerequisites

The prerequisites to establish a connection:

Whitelisting Ports

Whitelist the inbound port 21050 to allow OvalEdge to connect to the Apache Impala Server database.

Apache Impala uses port 21050 by default. When a different port is configured, specify the updated port number during connection setup, whitelist the port, and establish proper communication between the system and the Apache Impala Server.

Service Account User Permissions

Use a dedicated service account to establish the connection to the data source, configured with the following minimum set of permissions.

👨‍💻 Who can provide these permissions? The Apache Impala administrator grants these permissions, as standard accounts may not have the required access to assign them independently.

Operation

Objects

System Tables

Access Permission

Crawling & Profiling

Schema

USAGE on the database

USAGE

Crawling & Profiling

Tables

USAGE on the database
SELECT privilege on tables

SELECT & USAGE

Crawling & Profiling

Table Columns

USAGE on the database
SELECT privilege on tables

SELECT & USAGE

Crawling & Profiling

Primary Keys (PK) and Foreign Keys (FK)

USAGE on the database
SELECT privilege on tables

SELECT & USAGE

Connection Configuration Steps

Users must have the Connector Creator role to configure a new connection.

Log in to OvalEdge, go to Administration > Connectors, click + (New Connector), search for Impala, and complete the required parameters.

Fields marked with an asterisk (*) are mandatory for establishing a connection.

Field Name

Description

Connector Type

By default, "Apache Impala" is displayed as the selected connector type.

Authentication*

The following two types of authentication are supported for Apache Impala Server:

Kerberos Authentication
Non-Kerberos Authentication

Field Name

Description

Credential Manager*

Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.

Supported Credential Managers:

Database
HashiCorp
AWS Secrets Manager
Azure Key Vault

Connector Name*

Enter a unique name for the Apache Impala connection

(Example: "Apache Impala_Prod").

Connector Environment

Select the environment (Example: PROD, STG) configured for the connector.

Connector Description

Enter the description related to the connector.

Server*

Enter the IP address of the server where Apache Impala is hosted.

Port*

Apache Impala uses port 21050 by default. The port number can be modified as needed.

Database*

The ‘Database’ field specifies the default schema to connect to within the Impala server.

Example: If the target database is sales_db, enter sales_db to connect directly instead of the default schema.

Driver*

By default, Apache Impala uses ‘com.cloudera.impala.jdbc41.Driver.’ This field is not editable.

Principal

The Principal field specifies the Kerberos principal that the client will use to authenticate to the Impala service. It identifies the service account for Impala in the Kerberos realm.

Connection String

Configure the connection string for the Impala server:

Automatic Mode: The system generates a connection string based on the provided credentials.
Manual Mode: Enter a valid connection string manually.

Replace placeholders with actual server details:

{server} refers to the Impala host or IP address.
{sid} refers to the database name (schema).

Authentication Plugins: jdbc:hive2://{server}:2xxx/{sid};principal=impala/undefined

This is the default authentication string used for connecting to Impala. The principal parameter specifies the Impala service principal for authentication.

Keytab

The Keytab field specifies the path to the Kerberos keytab file containing the principal’s credentials. It is used to securely authenticate the client to the Impala server without manual password entry.

Krb5-Configuration File*

The Krb5-Configuration File field specifies the path to the krb5.conf file used for Kerberos authentication. It provides the necessary Kerberos realm and KDC information for Impala to validate the user's credentials.

Plugin Server

Enter the server name when running as a plugin server.

Plugin Port

Enter the port number on which the plugin is running.

Field Name

Description

Credential Manager*

Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.

Supported Credential Managers:

Database
HashiCorp
AWS Secrets Manager
Azure Key Vault

Connector Name*

Enter a unique name for the Apache Impala connection

(Example: "Apache Impala_Prod").

Connector Environment

Select the environment (Example: PROD, STG) configured for the connector.

Connector Description

Enter the description related to the connector.

Server*

Enter the IP address of the server where Apache Impala is hosted.

Port*

Apache Impala uses port 21050 by default. The port number can be modified as needed.

Database*

The ‘Database’ field specifies the default schema to connect to within the Impala server.

Example: If the target database is sales_db, enter sales_db to connect directly instead of the default schema.

Driver*

By default, Apache Impala uses ‘com.cloudera.impala.jdbc41.Driver.’ This field is not editable.

Username*

The username field specifies the user account used to connect to the Impala server.

Password*

The Password field should contain the user’s password associated with the provided username. It is used to authenticate the connection when establishing a session with the Impala server.

Connection String

Configure the connection string for the Impala server:

Automatic Mode: The system generates a connection string based on the provided credentials.
Manual Mode: Enter a valid connection string manually.

Replace placeholders with actual server details:

{server} refers to the Impala host or IP address.
{sid} refers to the database name (schema).

Authentication Plugins: jdbc:hive2://{server}:2xxx/{sid};principal=impala/undefined

This is the default authentication string used for connecting to Impala. The principal parameter specifies the Impala service principal for authentication.

Plugin Server

Enter the server name when running as a plugin server.

Plugin Port

Enter the port number on which the plugin is running.

Default Governance Roles

Default Governance Roles*

Select the appropriate users or teams for each governance role from the drop-down list. All users and teams configured in OvalEdge Security are displayed for selection.

Admin Roles

Admin Roles*

Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured in OvalEdge Security are available for selection.

No of Archive Objects

No Of Archive Objects*

This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.

Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.

Bridge

Select Bridge*

If applicable, select the bridge from the drop-down list.

The drop-down list displays all active bridges configured in OvalEdge. These bridges enable communication between data sources and OvalEdge without altering firewall rules.

After entering all connection details, the following actions can be performed:
- Click Validate to verify the connection.
- Click Save to store the connection for future use.
- Click Save & Configure to apply additional settings before saving.
The saved connection will appear on the Connectors home page.

Manage Connector Operations

Crawl/Profile

To perform crawl and profile operations, users must be assigned the Integration Admin role.

The Crawl/Profile button allows users to select one or more schemas for crawling and profiling.

Navigate to the Connectors page and click Crawl/Profile.
Select the schemas to crawl.
The Crawl option is selected by default. Click the Crawl & Profile radio button to enable both operations.
Click Run to collect metadata from the connected source and load it into the Data Catalog.
After a successful crawl, the information appears in the Data Catalog > Databases tab.

The Schedule checkbox allows automated crawling and profiling at defined intervals, from a minute to a year.

Click the Schedule checkbox to enable the Select Period drop-down.
Select a time period for the operation from the drop-down menu.
Click Schedule to initiate metadata collection from the connected source.
The system will automatically execute the selected operation (Crawl or Crawl & Profile) at the scheduled time.

Other Operations

The Connectors page in OvalEdge provides a centralized view of all configured connectors, including their health status.

Managing connectors includes:

Connectors Health: Displays the current status of each connector, with a green icon for active connections and a red icon for inactive connections, helping monitor connectivity to data sources.
Viewing: Click the Eye icon next to the connector name to view connector details, including Tables, Views, and Columns.

Nine Dots Menu Options:

To view, edit, validate, configure, or delete connectors, click on the Nine Dots menu.

Edit Connector: Update and revalidate the data source.
Validate Connector: Check the integrity of the connection.
Settings: Modify connector settings.
- Crawler: Configure data extraction.
- Profiler: Customize data profiling rules and methods.
- Query Policies: Define query execution rules based on roles.
- Access Instructions: Add notes on how data can be accessed.
- Business Glossary Settings: Manage term associations at the connector level.
- Connection Pooling: Allows configuring parameters such as maximum pool size, idle time, and timeouts directly within the application.
- Others: Configure notification recipients for metadata changes.
Delete Connector: Remove a connector with confirmation.

Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

S.No.

Error Message(s)

Error Description & Resolution

Handler dispatch failed: java.lang.NoSuchFieldError: DEFAULT_MAX_WAIT

Description: This error occurs when the Impala connector (or a dependent library, such as the JDBC/ODBC driver or Hadoop/Hive libraries) tries to access a field named DEFAULT_MAX_WAIT that does not exist in the loaded version of the class.

Resolution: Ensure that the Impala JDBC/ODBC driver version matches the Impala server version. Confirm that any Hadoop/Hive libraries on the classpath are compatible with the connector.

PreviousAlloyDB NextIBM DB2 ODBC

Last updated 15 days ago

Was this helpful?