# Apache Hive

This document outlines the integration with the Apache Hive connector, enabling streamlined metadata management through features such as crawling, data preview, and manual lineage building. It also ensures secure authentication via Credential Manager.

<figure><img src="https://content.gitbook.com/content/ztcvwwOJCeaE1n6oHp4C/blobs/Ccgof4bzXNHnPsjY30pi/image.png" alt=""><figcaption></figcaption></figure>

## Overview

### Connector Details

| Connector Category                                                              | Big Data Platform |
| ------------------------------------------------------------------------------- | ----------------- |
| Connector Version                                                               | Release6.3.4      |
| Releases Supported (Available from)                                             | Legacy connector  |
| <p>Connectivity</p><p>\[How the connection is established with Apache Hive]</p> | JDBC              |
| Verified Apache Hive Version                                                    | 5.8.0             |

{% hint style="info" %}
The Apache Hive connector has been validated with the mentioned "Verified Apache Hive Versions" and is expected to be compatible with other supported Apache Hive versions. If there are any issues with validation or metadata crawling, please submit a support ticket for investigation and feedback.
{% endhint %}

### Connector Features

| Feature                                      | Availability |
| -------------------------------------------- | :----------: |
| Crawling                                     |       ✅      |
| Delta Crawling                               |       ❌      |
| Profiling                                    |       ✅      |
| Query Sheet                                  |       ✅      |
| Data Preview                                 |       ✅      |
| Auto Lineage                                 |       ✅      |
| Manual Lineage                               |       ✅      |
| Secure Authentication via Credential Manager |       ✅      |
| Data Quality                                 |       ❌      |
| DAM (Data Access Management)                 |       ❌      |
| Bridge                                       |       ✅      |

### Metadata Mapping

The following objects are crawled from Apache Hive and mapped to the corresponding UI assets.

<table><thead><tr><th width="179.58331298828125">Apache Hive Object</th><th width="189.5">Apache Hive Attribute</th><th width="178">OvalEdge Attribute</th><th width="174.25">OvaEdge Category</th><th width="156.75">OvalEdge Type</th></tr></thead><tbody><tr><td>Schema</td><td>Schema name</td><td>Schema</td><td>Databases</td><td>Schema</td></tr><tr><td>Table</td><td>Table Name</td><td>Table</td><td>Tables</td><td>Table</td></tr><tr><td>Table </td><td>Table Type</td><td>Type</td><td>Tables</td><td>Table</td></tr><tr><td>Table </td><td>Table Comments</td><td>Source Description</td><td>Descriptions</td><td>Source Description</td></tr><tr><td>Columns</td><td>Column Name</td><td>Column</td><td>Table Columns</td><td>Columns</td></tr><tr><td>Columns </td><td>Data Type</td><td>Column Type</td><td>Table Columns</td><td>Columns</td></tr><tr><td>Columns</td><td>Description</td><td>Source Description</td><td>Table Columns</td><td>Columns</td></tr><tr><td>Views</td><td>View Name</td><td>View</td><td>Tables</td><td>View</td></tr></tbody></table>

## Set up a Connection

### Prerequisites

The following are the prerequisites to establish a connection.

### **Service Account User Permissions**

{% hint style="warning" %}
It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.
{% endhint %}

{% hint style="info" %}
👨‍💻Who can provide these permissions? These permissions are typically granted by the Apache Hive administrator, as users may not have the required access to assign them independently.
{% endhint %}

<table><thead><tr><th width="215.74993896484375">Objects</th><th>System Tables</th><th>Access Permission</th></tr></thead><tbody><tr><td>Schema</td><td>USAGE on the database</td><td>USAGE</td></tr><tr><td>Tables</td><td><p>USAGE on the database</p><p>SELECT privilege on tables</p></td><td>SELECT and USAGE</td></tr><tr><td>Table Columns</td><td><p>USAGE on the database</p><p>SELECT on the table</p></td><td>SELECT and USAGE</td></tr><tr><td>Primary Keys (PK) and Foreign Keys (FK)</td><td><p>USAGE on the database</p><p>SELECT on the table</p></td><td>SELECT and USAGE</td></tr></tbody></table>

### Connection Configuration Steps

{% hint style="warning" %}
Users are required to have the Connector Creator role in order to configure a new connection.
{% endhint %}

1. Log into **OvalEdge**, go to **Administration > Connectors**, click **+ (New Connector)**, search for **Apache Hive**, and complete the required parameters.

{% hint style="info" %}
Fields marked with an asterisk (\*) are mandatory for establishing a connection.
{% endhint %}

<table><thead><tr><th width="156">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Connector Type</td><td>By default, "Hive" is displayed as the selected connector type.</td></tr><tr><td>Authentication</td><td><p>The following two types of authentication are supported for Apache Hive:</p><ul><li>Kerberos </li><li>Non-Kerberos</li></ul></td></tr></tbody></table>

{% tabs %}
{% tab title="Kerberos Authentication" %}

<table><thead><tr><th width="184.6666259765625">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selected option.</p><p>Supported Credential Managers:</p><ul><li>Database</li><li>AWS Secrets Manager</li><li>HashiCorp</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td><p></p><p><br></p><ul><li>Select the checkbox for Auto Lineage Add-On to build data lineage automatically.</li></ul></td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Apache Hive connection              </p><p>(Example: "ApacheHive").</p></td></tr><tr><td>Connector Description</td><td>Enter a brief description of the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment (Example: PROD, STG) configured for the connector.</td></tr><tr><td>Server*</td><td>Enter the Apache Hive database server name or IP address (Example: hive-server.company.com or 192.168.1.10.</td></tr><tr><td>Port*</td><td>By default, the port number for the Apache Hive, "1000" is auto-populated. If required, the port number can be modified as per the custom port number that is configured for the Apache Hive.</td></tr><tr><td>Database*</td><td>Enter the database name to which the service account user has access within the Apache Hive.</td></tr><tr><td>Driver*</td><td>By default, the Apache Hive driver details are auto-populated.</td></tr><tr><td>Principal</td><td>Kerberos principal name for authentication</td></tr><tr><td>Connection String</td><td><p>Configure the connection string for the Apache Hive database:</p><ul><li><strong>Automatic Mode:</strong> The system generates a connection string based on the provided credentials.</li><li><strong>Manual Mode:</strong> Enter a valid connection string manually.</li></ul><p>Replace placeholders with actual database details.</p><p>{sid} refers to Database Name.</p></td></tr><tr><td>Keytab</td><td>Kerberos keytab file for authentication.</td></tr><tr><td>Krb5-Configuration File*</td><td><br>Path to the Kerberos configuration file (krb5.conf) required for authentication.</td></tr></tbody></table>
{% endtab %}

{% tab title="Non-Kerberos Authentication" %}

<table><thead><tr><th width="160.66668701171875">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selected option.</p><p>Supported Credential Managers:</p><ul><li>Database</li><li>AWS Secrets Manager</li><li>HashiCorp</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td><p></p><p></p><ul><li>Select the checkbox for Auto Lineage Add-On to build data lineage automatically.</li></ul></td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Apache Hive connection              </p><p>(Example: "ApacheHive").</p></td></tr><tr><td>Connector Description</td><td>Enter a brief description of the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment (Example: PROD, STG) configured for the connector.</td></tr><tr><td>Server*</td><td>Enter the Apache Hive database server name or IP address (Example: hive-server.company.com or 192.168.1.10.</td></tr><tr><td>Port*</td><td>By default, the port number for the Apache Hive, "1000" is auto-populated. If required, the port number can be modified as per the custom port number that is configured for the Apache Hive.</td></tr><tr><td>Database*</td><td>Enter the database name to which the service account user has access within the Apache Hive.</td></tr><tr><td>Driver*</td><td>By default, the Apache Hive driver details are auto-populated.</td></tr><tr><td>Username*</td><td><p>Service account username used for accessing Hive.</p><p>Note: </p><ul><li>Visible only when the installation environment is Linux/Unix.</li></ul></td></tr><tr><td>Password*</td><td><p>Password for the service account user.</p><p>Note: </p><ul><li>Visible only when the installation environment is Linux/Unix.</li></ul></td></tr><tr><td>Connection String</td><td><p>Configure the connection string for the Apache Hive database:</p><ul><li><strong>Automatic Mode:</strong> The system generates a connection string based on the provided credentials.</li><li><strong>Manual Mode:</strong> Enter a valid connection string manually.</li></ul><p>Replace placeholders with actual database details.</p><p>{sid} refers to Database Name.</p></td></tr></tbody></table>
{% endtab %}
{% endtabs %}

<table><thead><tr><th width="222">Default Governance Roles</th><th>Description</th></tr></thead><tbody><tr><td>Default Governance Roles*</td><td>Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.</td></tr><tr><td><strong>Admin Roles</strong></td><td></td></tr><tr><td>Admin Roles*</td><td><p>Select one or more users from the dropdown list for Integration Admin and Security &#x26; Governance Admin. All users configured</p><p>in the security settings are available for selection.</p></td></tr><tr><td><strong>No of Archive Objects</strong></td><td></td></tr><tr><td>No Of Archive Objects*</td><td><p>This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.</p><p>Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.</p></td></tr><tr><td><strong>Bridge</strong></td><td></td></tr><tr><td>Select Bridge*</td><td><p>If applicable, select the bridge from the drop-down list.</p><p>The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.</p></td></tr></tbody></table>

2. After entering all connection details, the following actions can be performed:
   1. Click **Validate** to verify the connection.
   2. Click **Save** to store the connection for future use.
   3. Click **Save & Configure** to apply additional settings before saving.
3. The saved connection will appear on the Connectors home page.

## Manage Connector Operations

### Crawl

{% hint style="warning" %}
To perform crawl operations, users must be assigned the Integration Admin role.
{% endhint %}

1. Navigate to the **Connectors** page and click **Crawl/Profile**.
2. Select the schemas to be crawled.
3. The Crawl option is selected by default. To perform both operations, select the **Crawl & Profile** radio button.
4. Click **Run** to collect metadata from the connected source and load it into the **Data Catalog**.
5. After a successful crawl, the information appears in the **Data Catalog > Databases** tab.

### Other Operations

The Connectors page provides a centralized view of all configured connectors, along with their health status.

**Managing connectors includes:**

* **Connectors Health:** Displays the current status of each connector using a green icon for active connections and a red icon for inactive connections, helping to monitor the connectivity with data sources.
* **Viewing:** Click the Eye icon next to the connector name to view connector details, including databases, tables, columns, and codes.

**Nine Dots Menu Options:**

To view, edit, validate, build lineage, configure, or delete connectors, click on the **Nine Dots** menu.

* **Edit Connector:** Update and revalidate the data source.
* **Validate Connector:** Check the connection's integrity.
* **Settings:** Modify connector settings.
  * **Crawler:** Configure data extraction.
  * **Profiler:** Customize data profiling rules and methods.
  * **Query Policies:** Define query execution rules based on roles.
  * **Access Instructions:** Add notes on how data can be accessed.
  * **Business Glossary Settings:** Manage term associations at the connector level.
  * **Others:** Configure notification recipients for metadata changes.
* **Build Lineage:** Automatically build data lineage using source code parsing.
* **Delete Connector:** Remove a connector with confirmation.

## Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

<table><thead><tr><th width="63">S.No.</th><th width="279.5" valign="top">Error Message(s)</th><th>Error Description &#x26; Resolution</th></tr></thead><tbody><tr><td>1</td><td valign="top"><p><br>Error while validating HIVE connection: Cannot create PoolableConnectionFactory (Could not open client transport with JDBC Uri: <code>jdbc:hive2://https:-1//</code></p><p><code>xxxxxxx.com/:10000/SID</code>: Cannot open without port.)</p></td><td><p><strong>Error Description:</strong> </p><p>The JDBC connection string is invalid because the port and URI are incorrectly defined.</p><p><strong>Error Resolution:</strong> </p><p>Enter a valid JDBC URI in the format:<br><code>jdbc:hive2://:/</code></p><p><code>Verify xxxServer2</code> is running and accessible on the specified port.</p><p>Check network/firewall settings to allow connectivity</p></td></tr></tbody></table>

***

&#x20;Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
