Apache Hive

This document outlines the integration with the Apache Hive connector, enabling streamlined metadata management through features such as crawling, data preview, and manual lineage building. It also ensures secure authentication via Credential Manager.

Overview

Connector Details

Connector Category

Big Data Platform

Connector Version

Release6.3.4

Releases Supported (Available from)

Legacy connector

Connectivity

[How the connection is established with Apache Hive]

JDBC

Verified Apache Hive Version

5.8.0

The Apache Hive connector has been validated with the mentioned "Verified Apache Hive Versions" and is expected to be compatible with other supported Apache Hive versions. If there are any issues with validation or metadata crawling, please submit a support ticket for investigation and feedback.

Connector Features

Feature

Availability

Crawling

✅

Delta Crawling

❌

Profiling

✅

Query Sheet

✅

Data Preview

✅

Auto Lineage

✅

Manual Lineage

✅

Secure Authentication via Credential Manager

✅

Data Quality

❌

DAM (Data Access Management)

❌

Bridge

✅

Metadata Mapping

The following objects are crawled from Apache Hive and mapped to the corresponding UI assets.

Apache Hive Object

Apache Hive Attribute

OvalEdge Attribute

OvaEdge Category

OvalEdge Type

Schema

Schema name

Schema

Databases

Schema

Table

Table Name

Table

Tables

Table

Table Type

Type

Tables

Table

Table Comments

Source Description

Descriptions

Source Description

Columns

Column Name

Column

Table Columns

Columns

Data Type

Column Type

Table Columns

Columns

Description

Source Description

Table Columns

Columns

Views

View Name

View

Tables

View

Set up a Connection

Prerequisites

The following are the prerequisites to establish a connection.

Service Account User Permissions

It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.

👨‍💻Who can provide these permissions? These permissions are typically granted by the Apache Hive administrator, as users may not have the required access to assign them independently.

Objects

System Tables

Access Permission

Schema

USAGE on the database

USAGE

Tables

USAGE on the database

SELECT privilege on tables

SELECT and USAGE

Table Columns

USAGE on the database

SELECT on the table

SELECT and USAGE

Primary Keys (PK) and Foreign Keys (FK)

USAGE on the database

SELECT on the table

SELECT and USAGE

Connection Configuration Steps

Users are required to have the Connector Creator role in order to configure a new connection.

Log into OvalEdge, go to Administration > Connectors, click + (New Connector), search for Apache Hive, and complete the required parameters.

Fields marked with an asterisk (*) are mandatory for establishing a connection.

Field Name

Description

Connector Type

By default, "Hive" is displayed as the selected connector type.

Authentication

The following two types of authentication are supported for Apache Hive:

Kerberos
Non-Kerberos

Field Name

Description

Credential Manager*

Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selected option.

Supported Credential Managers:

Database
AWS Secrets Manager
HashiCorp
Azure Key Vault

License Add Ons

Auto Lineage

Supported

Data Quality

Not Supported

Data Access

Not Supported

Select the checkbox for Auto Lineage Add-On to build data lineage automatically.

Connector Name*

Enter a unique name for the Apache Hive connection

(Example: "ApacheHive").

Connector Description

Enter a brief description of the connector.

Connector Environment

Select the environment (Example: PROD, STG) configured for the connector.

Server*

Enter the Apache Hive database server name or IP address (Example: hive-server.company.com or 192.168.1.10.

Port*

By default, the port number for the Apache Hive, "1000" is auto-populated. If required, the port number can be modified as per the custom port number that is configured for the Apache Hive.

Database*

Enter the database name to which the service account user has access within the Apache Hive.

Driver*

By default, the Apache Hive driver details are auto-populated.

Principal

Kerberos principal name for authentication

Connection String

Configure the connection string for the Apache Hive database:

Automatic Mode: The system generates a connection string based on the provided credentials.
Manual Mode: Enter a valid connection string manually.

Replace placeholders with actual database details.

{sid} refers to Database Name.

Keytab

Kerberos keytab file for authentication.

Krb5-Configuration File*

Path to the Kerberos configuration file (krb5.conf) required for authentication.

Default Governance Roles

Description

Default Governance Roles*

Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.

Admin Roles

Admin Roles*

Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured

in the security settings are available for selection.

No of Archive Objects

No Of Archive Objects*

This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.

Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.

Bridge

Select Bridge*

If applicable, select the bridge from the drop-down list.

The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.

After entering all connection details, the following actions can be performed:
1. Click Validate to verify the connection.
2. Click Save to store the connection for future use.
3. Click Save & Configure to apply additional settings before saving.
The saved connection will appear on the Connectors home page.

Manage Connector Operations

Crawl

To perform crawl operations, users must be assigned the Integration Admin role.

Navigate to the Connectors page and click Crawl/Profile.
Select the schemas to be crawled.
The Crawl option is selected by default. To perform both operations, select the Crawl & Profile radio button.
Click Run to collect metadata from the connected source and load it into the Data Catalog.
After a successful crawl, the information appears in the Data Catalog > Databases tab.

Other Operations

The Connectors page provides a centralized view of all configured connectors, along with their health status.

Managing connectors includes:

Connectors Health: Displays the current status of each connector using a green icon for active connections and a red icon for inactive connections, helping to monitor the connectivity with data sources.
Viewing: Click the Eye icon next to the connector name to view connector details, including databases, tables, columns, and codes.

Nine Dots Menu Options:

To view, edit, validate, build lineage, configure, or delete connectors, click on the Nine Dots menu.

Edit Connector: Update and revalidate the data source.
Validate Connector: Check the connection's integrity.
Settings: Modify connector settings.
- Crawler: Configure data extraction.
- Profiler: Customize data profiling rules and methods.
- Query Policies: Define query execution rules based on roles.
- Access Instructions: Add notes on how data can be accessed.
- Business Glossary Settings: Manage term associations at the connector level.
- Others: Configure notification recipients for metadata changes.
Build Lineage: Automatically build data lineage using source code parsing.
Delete Connector: Remove a connector with confirmation.

Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

S.No.

Error Message(s)

Error Description & Resolution

Error while validating HIVE connection: Cannot create PoolableConnectionFactory (Could not open client transport with JDBC Uri: jdbc:hive2://https:-1//

xxxxxxx.com/:10000/SID: Cannot open without port.)

Error Description:

The JDBC connection string is invalid because the port and URI are incorrectly defined.

Error Resolution:

Enter a valid JDBC URI in the format: jdbc:hive2://:/

Verify xxxServer2 is running and accessible on the specified port.

Check network/firewall settings to allow connectivity

PreviousApache Impala NextQubole Hive

Last updated 3 months ago

Was this helpful?