# Apache HBase

HBase is a distributed big data store. It is very effective for handling large and sparse datasets. OvalEdge uses an HBase client jar with Kerberos authentication that connects to the data source to crawl data objects and profile the sample data.

![](https://support.ovaledge.com/hs-fs/hubfs/image-png-Apr-26-2024-11-21-30-8340-AM.png?width=568\&height=263\&name=image-png-Apr-26-2024-11-21-30-8340-AM.png)

## **Connector Capabilities**

The connector capabilities are shown below:

### **Crawling**

| **Features** | **Supported Objects** | **Remarks** |
| ------------ | --------------------- | ----------- |
| Crawling     | Tables, Table Columns | -           |

### **Profiling**

Please see [Profiling Data](https://support.ovaledge.com/profile-data) for more details on profiling.

| **Features**     | **Details**                                   | **Remarks** |
| ---------------- | --------------------------------------------- | ----------- |
| Table Profiling  | Row count, Columns count, View sample data    |             |
| Column Profiling | Min, Max, Null count, distinct, top 50 values | -           |
| Full Profiling   | Supported                                     | -           |
| Sample Profiling | Supported                                     | -           |

### **Lineage Building**

| **Lineage Entities** | **Details**   |
| -------------------- | ------------- |
| Table Lineage        | Not Supported |
| Column Lineage       | Not Supported |

## **Prerequisites**

The following are prerequisites for connecting to the Hbase Connector:

### **Connection Details**

The following connection settings should be added for connecting to a Hbase database:

* #### **Kerberos Authentication**

  If opting for Kerberos authentication, users should ensure they have the following path details prepared.

  * Krb5.config file
  * keytab file&#x20;
* #### **Non Kerberos Authentication:**

### **Drivers**

The drivers used by the connector are given below:

| **Driver / API**    | **Version** | **Details**                                                                                                                                                                                               |
| ------------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Hbase driver        | 2.2.3       | <p><a href="https://mvnrepository.com/artifact/org.apache.hbase/hbase/2.4.3"><https://mvnrepository.com/artifact/org.apache.hbase/hbase/></a></p><p><em>Note</em>: The latest version is 2.4.4</p>        |
| Hbase Client Driver | 2.2.3       | <p><a href="https://mvnrepository.com/artifact/org.apache.hbase/hbase-client"><https://mvnrepository.com/artifact/org.apache.hbase/hbase-client></a></p><p><em>Note</em>: The latest version is 2.4.4</p> |

### **Configuring Environment Variables**

Configuring environment names enables you to select the appropriate environment from the drop-down list when adding a connector. This allows for consistent crawling of schemas across different environments, such as production (PROD), staging (STG), or temporary environments. It also facilitates schema comparisons and assists in application upgrades by providing a temporary environment that can later be deleted.

Before establishing a connection, it is important to configure the environment names for the specific connector. If your environments have been configured, skip this step.

#### **Steps to Configure the Environment**

1. Log into the OvalEdge application.
2. Navigate to **Administration** >  **System Settings**.
3. Select the Connector tab.
4. Find the key name “connector.environment”.<br>
5. Enter the desired environment values (PROD, STG) in the **Value** column.
6. Click ✔ to Save.

### **Service Account Permissions**

A service account is required for crawling and profiling. By default, the service account provided for the connector will be used for any query operations. If the service account has a write privilege, insert, update, and delete queries can be executed. The minimum privileges required are listed below.

| **Operation**       | **Access Permission** |
| ------------------- | --------------------- |
| Connection validate | Read                  |
| Crawl schemas       | Read                  |
| Crawl tables        | Read                  |
| Profile tables      | Read                  |

## **Establish a Connection**

To connect to HBase Connector using the OvalEdge application, complete the following steps:

1. Log in to the **OvalEdge** application.
2. Navigate to **Administration** > **Connectors.**
3. Click on the **+** (New Connector) icon.<br>
4. The **Add Connector** pop-up window is displayed, and you can search for the Hbase Connector.<br>
5. The Add Connector with Connector Type specific details pop-up window is displayed. Enter the relevant information to configure the Hbase connection.\
   **Note:** An asterisk (\*) denotes a mandatory field for establishing a connection.

   <table data-header-hidden><thead><tr><th width="150.8333740234375"></th><th></th></tr></thead><tbody><tr><td><strong>Field Name</strong></td><td><strong>Description</strong></td></tr><tr><td><strong>Connector Type</strong></td><td>This field allows you to select the connector from the drop-down list provided. By default, 'Hbase’ is displayed as the selected connector type.</td></tr><tr><td><strong>Authentication*</strong></td><td><p><strong>Kerberos Authentication</strong>: User Client can be authenticated using a Kerberos file and a principal provided.</p><p><strong>Non-Kerberos Authentication:</strong> No Authentication is needed if the server is up and running; we just need to provide a server.</p></td></tr><tr><td><strong>Credential Manager</strong> </td><td><p>Select the option from the drop-down menu where you want to save your credentials:</p><p><strong>OE Credential Manager:</strong> The Greenplum connection is configured with the basic Username and Password of the service account in real time when OvalEdge establishes a connection to the Greenplum database. Users must manually add the credentials if the OE Credential Manager option is selected.</p><p><strong>HashiCorp:</strong> The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.  </p><p><strong>AWS Secrets Manager:</strong> The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.</p><p>For more information on Azure Key Vault, refer to <a href="https://support.ovaledge.com/azurekeyvaultintegration">Azure Key Vault.</a></p><p>For more information on Credential Manager, refer to <a href="https://support.ovaledge.com/credential-manager">Credential Manager</a>.</p></td></tr><tr><td><strong>License Add Ons</strong></td><td><p>By default, all the connectors will have a Base Connector License, which allows you to crawl and profile to obtain metadata and statistical information from a data source. <br><br>OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.<br><br>Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. <br>Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality using DQ Rules/functions, Anomaly detection, Reports, and more.By default, all the connectors will have a Base Connector License, which allows you to crawl and profile to obtain metadata and statistical information from a data source. </p><p>OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.</p><ul><li>Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality using DQ Rules/functions, Anomaly detection, Reports, and more.</li></ul></td></tr><tr><td><strong>Connector Name*</strong></td><td><p>Select a Connection name for the Hbase Server database. You can specify a reference name to easily identify your Hbase Server database connection in OvalEdge.</p><p>Example: Hbase Connection DB1</p></td></tr><tr><td><strong>Zookeeper Host Quorum*</strong></td><td><p>Zookeeper Cluster URL (on-premises/cloud-based)</p><p>Example:18.220.154.229</p></td></tr><tr><td><strong>Zookeeper Port*</strong></td><td><p>The default port number is 2181. </p><p>Note: It might get changed.</p></td></tr><tr><td><strong>HBase Master</strong></td><td><p>Master Server IP with Port.</p><p>Example: 18.220.154.229:60000</p></td></tr><tr><td><strong>KEYTAB*</strong></td><td><p>Keytab file along with path.</p><p>Example: D://hbase_configs//chakri.keytab</p></td></tr><tr><td><strong>Kerberos Principal*</strong></td><td><p>Unique identity for authentication.</p><p>Example: chakri/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL</p></td></tr><tr><td><strong>Zookeeper Parent Node*</strong></td><td><p>Node containing other nodes in ZooKeeper. Ex: /hbase</p><p>Note: Might get changed.</p></td></tr><tr><td><strong>Master Server Principal*</strong></td><td><p>Identity for a master server in a cluster.</p><p>Example: hbase/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL</p></td></tr><tr><td><strong>Region Server Principal*</strong></td><td><p>Identity for a region server in a cluster.</p><p>Example: hbase/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL</p></td></tr><tr><td><strong>Krb5-Configuration File*</strong></td><td>Krb5 file along with path</td></tr><tr><td><strong>Default Governance Roles</strong></td><td> </td></tr><tr><td><strong>Steward*</strong></td><td>Select the Steward from the drop-down list options.</td></tr><tr><td><strong>Custodian*</strong></td><td>Select the Custodian from the drop-down list options.</td></tr><tr><td><strong>Owner*</strong></td><td>Select the Owner from the drop-down list options.</td></tr><tr><td><strong>Governance Roles 4, 5, 6*</strong></td><td><p>Select the respective user from the drop-down options.</p><p><em><strong>Note:</strong> The drop-down list displays all the configurable roles (for a single user or a team) according to the configurations made in the OvalEdge <strong>Security</strong> > <strong>Governance Roles</strong> section.</em></p></td></tr><tr><td><strong>Admin Roles</strong></td><td> </td></tr><tr><td><strong>Integration Admins*</strong></td><td>To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, then click the <strong>Apply</strong> button.<br>The Integration Admin's responsibilities include configuring the connector's crawling and profiling settings and deleting connectors, schemas, or data objects.</td></tr><tr><td><strong>Security and Governance Admins*</strong></td><td><p>To add Security and Governance Admin roles, search for or select one or more roles from the list and click the Apply button.<br>The Security and Governance Admin is responsible for:</p><ul><li>Configuring role permissions for the connector and its associated data objects.</li><li>Adding admins to set permissions for the connector's roles and associated data objects.</li><li>Updating governance roles.</li><li>Creating custom fields.</li><li>Developing Service Request templates for the connector.</li><li>Creating approval workflows for  Service Request templates.</li></ul></td></tr><tr><td><strong>Select Bridge</strong></td><td><p>With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data source(s) without modifying firewall rules. A bridge provides real-time control, making data movement between source and destination easy. For more information, refer to</p><p><a href="https://support.ovaledge.com/bridge-overview">Bridge Overview</a>.</p></td></tr><tr><td><strong>Non-Kerberos</strong></td><td> </td></tr><tr><td><strong>Hbase Rest Server*</strong></td><td>Name or IP of the server on which the DB server is running</td></tr><tr><td><strong>Hbase Rest Server Port*</strong></td><td><p>The port number on which the server is running</p><p>Example:20550</p></td></tr></tbody></table>
6. After entering all the required connection details, select the appropriate option based on your preferences:
   1. **Validate:** Click the Validate button to verify the connection details. This ensures that the provided information is accurate and enables successful connection establishment.
   2. **Save:** Click on the Save button to store the connection details. Once saved, the connection will be added to the Connectors home page for easy access.
   3. **Save & Configure:** For certain Connectors requiring additional configuration settings, click the Save & Configure button. This will open the Connection Settings pop-up window, allowing you to configure the necessary settings before saving the connection.
7. Once the connection is validated and saved, it will be displayed on the Connectors home page.\
   \&#xNAN;***Note:*** You can either save the connection details first or validate the connection first and then save it.

## **Connection Validation Details**

| **S.No** | **Error Message(s)**                       | **Description**                                                                                    |
| -------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------- |
| 1        | **Connection Timeout**                     | Please conduct an investigation into issues regarding firewall configurations and port enablement. |
| 2        | **The file path does not exist**           | The specified file path does not exist.                                                            |
| 3        | **Cannot get Kerberos Realm**              | Ensure that krb5 and key tab files are on the right path.                                          |
| 4        | **Key tab login Failure for a given user** | Check if the key tab file details are correct for the given user.                                  |

***Note**:* If you have issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

## **Connector Settings**

Once the connection is successfully established, various settings are provided to fetch and analyze the information from the data source.

The connection settings include Crawler, Profiler, Access Instruction, Business Glossary Settings, and others.

To view the Connector Settings page,

1. Go to the Connectors page.
2. From the 9- dots, select the **Settings** option.<br>
3. This will display the Connector Settings page, where you can view all the connector settings.<br>
4. When you have finished making your desired changes, click on **Save Changes**. All setting changes will be applied to the metadata.
5. The following is a list of connection settings and their corresponding descriptions.

| **Connection Settings**                             | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Crawler**                                         | Crawler settings are configured to connect to a data source and collect and catalog all the data elements in metadata.                                                                                                                                                                                                                                                                                                                                                                                                    |
| <p><strong>Profiler</strong></p><p><br><br><br></p> | Profiler settings govern gathering statistics and informative summaries about the connected data source(s). These statistics can help assess the quality of data sources before using them for analysis. Profiling is always optional; crawling can be run without profiling.                                                                                                                                                                                                                                             |
| **License Add Ons**                                 | <p>All the connectors will have a Base Connector License by default, which allows you to crawl and profile to obtain metadata and statistical information from a data source. </p><p>OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.</p><ul><li>Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality using DQ Rules/functions, Anomaly detection, Reports, and more.</li></ul> |
| **Access Instruction**                              | Access Instruction allows the data owner to instruct others on using the objects in the application.                                                                                                                                                                                                                                                                                                                                                                                                                      |
| **Business Glossary Settings**                      | The Business Glossary Settings provide flexibility and control over how users view and manage term association within a business glossary at the connector level.                                                                                                                                                                                                                                                                                                                                                         |
| **Others**                                          | <p>The Enable/Disable Metadata Change Notifications option sets the change notification about metadata changes of the data objects.</p><ul><li>You can use the toggle button to set the Default Governance Roles (Steward, Owner Custodian, etc.) </li><li>Using the <strong>Roles</strong> and <strong>Teams</strong>, you can select the role and team to receive the notification of metadata changes.</li></ul>                                                                                                       |

***Note**: For more information, refer to the* [*Connector Settings*](https://support.ovaledge.com/connector-settings)*.*

## **Crawling of Schema(s)**

The **Crawl/Profile** option allows you to select the schema for the following operations:     &#x20;

Crawl, Crawl & Profile, Profile, or Profile Unprofiled. Under the Action section, the defined run date and time are displayed for any scheduled crawlers and profilers.

1. Navigate to the Connectors page and click on the **Crawl/Profile** button.\
   **Select Important Schema For Crawling and Profiling** pop-up window is displayed.<br>
2. Select the schema.
3. The list of actions below is displayed in the **Action** section.
   1. **Crawl**:  This allows the selected schema(s) metadata to be crawled.
   2. **Crawl & Profile**: This allows the metadata of the selected schema(s) and profiles of the sample data to be crawled.
   3. **Profile**: This allows the collection of table column statistics.
   4. **Profile Unprofiled**: This allows data that has not been profiled to be profiled.
   5. **Schedule**: Connectors can also be scheduled in advance to run crawling and/or profiling at prescribed times and selected intervals.\
      \&#xNAN;***Note:** For more information on Scheduling, refer to* [***Scheduling Connector***](https://support.ovaledge.com/how-to-schedule-connectors)*.*
4. Click on the **Run** button. This gathers all metadata from the connected source and puts it into the OvalEdge Data Catalog.

## **Points to note**

* We cannot have queries for Hbase.
* In crawler rules, we won't be using include and exclude regex functionalities for Views, functions, and procedures, which are not present in HBase.

***

Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
