# Amazon Athena

Amazon Athena is a service that enables a data analyst to perform interactive queries in the AWS public cloud on data stored in Amazon Simple Storage Service (S3). In the OvalEdge application, the Amazon Athena connector allows you to crawl the database objects like Schemas, Tables, Table Columns, and Views, sample profile the data, and build lineage.

<figure><img src="/files/Bn4mZkUNXnLM5o9SEA02" alt=""><figcaption></figcaption></figure>

## **Connector Capabilities**

<table data-header-hidden><thead><tr><th></th><th width="207.75"></th><th></th></tr></thead><tbody><tr><td><strong>Driver/API</strong></td><td><strong>Version</strong></td><td><strong>Details</strong></td></tr><tr><td>Athena SDK</td><td>1.12.267</td><td><a href="https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-athena/1.12.267">https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-athena/1.12.267</a></td></tr></tbody></table>

## **Technical Specifications**

### **Connector Capabilities**

To crawl Athena schemas, the connector requires permissions to access the data catalog and permissions to execute Athena queries.

Connectivity to Amazon Athena is performed via the **AWS SDK for Java v2 (Athena)**. Query results are stored in the configured **Amazon S3 output location**, which is validated at connection time.

### Connector Features

| Feature                                      | Availability |
| -------------------------------------------- | :----------: |
| Crawling                                     |       ✅      |
| Delta Crawling                               |       ❌      |
| Profiling                                    |       ✅      |
| Query Sheet                                  |       ✅      |
| Auto Lineage                                 |       ✅      |
| Manual Lineage                               |       ✅      |
| Secure Authentication via Credential Manager |       ✅      |
| DAM (Data Access Management)                 |       ❌      |
| Bridge                                       |       ✅      |

### **Querying**

| **Operation**          | **Details**   |
| ---------------------- | ------------- |
| Select                 | Supported     |
| Joins within database  | Supported     |
| Joins outside database | Not supported |
| Aggregations           | Supported     |
| Group By               | Supported     |
| Order By               | Supported     |

{% hint style="success" %}
***Note:** By default, the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert/Update/Delete queries can be executed.*
{% endhint %}

## **Connection Details**

### **Pre-requisites**

The minimum permissions required for OvalEdge to validate and crawl the Amazon Athena Database connection are that the path provided in the connection parameters must be a valid S3 path, and the user accessing the files must have access to that specific folder.

The IAM role/user (for example: ovaxxxge-bxxxge-xxx-xxx) must have appropriate Athena, S3, and Glue permissions.

* An admin/service account for OvalEdge Data Catalog Operations.

The minimum privileges required for a user are:

<table><thead><tr><th width="163.0909423828125">Operations</th><th width="172.6363525390625">Objects Impacted</th><th width="401.727294921875">Access Permissions</th></tr></thead><tbody><tr><td>Validation</td><td>S3 Bucket</td><td>s3:HeadBucket, s3:ListBucket, s3:HeadObject</td></tr><tr><td>Validation</td><td>Athena</td><td>athena:ListDatabases</td></tr><tr><td>Crawling</td><td>Schema</td><td>athena:ListDatabases, glue:GetDatabases, glue:GetDatabase</td></tr><tr><td>Crawling</td><td>Table</td><td>athena:ListTableMetadata, glue:GetTables, glue:GetTable</td></tr><tr><td>Crawling</td><td>Columns</td><td>athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults</td></tr><tr><td>Crawling</td><td>Query Results (S3)</td><td>s3:PutObject, s3:GetObject, s3:GetBucketLocation</td></tr></tbody></table>

### Set up a Connection

{% hint style="warning" %}
*Users are required to have the Connector Creator role in order to configure a new connection.*
{% endhint %}

1. Log into **OvalEdge**, go to **Administration > Connectors**, click **+ (New Connector)**, search for **Amazon Athena**, and complete the required parameters.

{% hint style="info" %}
***Note:** Fields marked with an asterisk (\*) are mandatory for establishing a connection.*
{% endhint %}

<table><thead><tr><th width="221.25">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Connector Type</td><td>By default, "<strong>Amazon Athena</strong>" is displayed as the selected connector type.</td></tr><tr><td>Authentication*</td><td><p>Select the authentication type from the drop-down.</p><ul><li>IAM User Authentication</li><li>Role-based Authentication</li></ul></td></tr></tbody></table>

{% tabs %}
{% tab title="IAM User Authentication" %}

<table><thead><tr><th width="199.90911865234375">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on your selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp Vault</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td>Select the checkbox for the Auto Lineage Add-On to build data lineage automatically.</td></tr><tr><td>Connector Name*</td><td>Enter a unique name for the connector.</td></tr><tr><td>Connector Description</td><td>Enter a brief description to describe the purpose of the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment where the connector will be configured.</td></tr><tr><td>Access Key*</td><td>Enter the AWS access key for authentication.</td></tr><tr><td>Secret Key*</td><td>Enter the AWS secret key associated with the access key.</td></tr><tr><td>Database Region*</td><td>Enter the AWS region where the Athena database is hosted (for example, xx-xxx-1).</td></tr><tr><td>Catalog Name*</td><td>Enter the data catalog name to be used for the connection (default: AwsDataCatalog).</td></tr><tr><td>Output S3 Folder Path*</td><td>Enter the Amazon S3 path where query results will be stored.</td></tr></tbody></table>
{% endtab %}

{% tab title="Role-based Authentication" %}

<table><thead><tr><th width="200.8182373046875">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on your selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp Vault</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td>Select the checkbox for the Auto Lineage Add-On to build data lineage automatically.</td></tr><tr><td>Connector Name*</td><td>Enter a unique name for the connector.</td></tr><tr><td>Connector Description</td><td>Enter a brief description to describe the purpose of the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment where the connector will be configured.</td></tr><tr><td>Cross-Account Role ARN</td><td>Enter the ARN of the Role Based that allows access to the target account for establishing the connection.</td></tr><tr><td>Database Region*</td><td>Enter the AWS region where the Athena database is hosted (for example, xx-xxx-1).</td></tr><tr><td>Catalog Name*</td><td>Enter the data catalog name to be used for the connection (default: AwsDataCatalog).</td></tr><tr><td>Output S3 Folder Path*</td><td>Enter the Amazon S3 path where query results will be stored.</td></tr></tbody></table>
{% endtab %}
{% endtabs %}

**Default Governance Roles**

<table data-header-hidden><thead><tr><th width="221.5"></th><th></th></tr></thead><tbody><tr><td>Default Governance Roles*</td><td>Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**Admin Roles**

<table data-header-hidden><thead><tr><th width="221.5"></th><th></th></tr></thead><tbody><tr><td>Admin Roles*</td><td>Select one or more users from the dropdown list for Integration Admin and Security &#x26; Governance Admin. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**No of Archive Objects**

<table data-header-hidden><thead><tr><th width="221.5"></th><th></th></tr></thead><tbody><tr><td>No Of Archive Objects*</td><td>This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.Example: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.</td></tr></tbody></table>

**Bridge**

<table data-header-hidden><thead><tr><th width="222.75"></th><th></th></tr></thead><tbody><tr><td>Select Bridge*</td><td>If applicable, select the bridge from the drop-down list.The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.</td></tr></tbody></table>

2. After entering all connection details, the following actions can be performed:
   1. Click **Validate** to verify the connection.
   2. Click **Save** to store the connection for future use.
   3. Click **Save & Configure** to apply additional settings before saving.
3. The saved connection will appear on the Connectors home page.

## Manage Connector Operations

### Crawl/Profile

{% hint style="warning" %}
*To perform crawl and profile operations, users must be assigned the Integration Admin role.*
{% endhint %}

The **Crawl/Profile** button allows users to select one or more schemas for **crawling** and **profiling**.&#x20;

1. Navigate to the **Connectors** page and click **Crawl/Profile**.
2. Select the schemas to be crawled.
3. The **Crawl** option is selected by default. To perform both operations, select the **Crawl & Profile** radio button.
4. Click **Run** to collect metadata from the connected source and load it into the **Data Catalog**.
5. After a successful crawl, the information appears in the **Data Catalog > Databases** tab.

The Schedule checkbox allows automated crawling and profiling at defined intervals, from a minute to a year.

1. Click the **Schedule** checkbox to enable the Select Period drop-down.
2. Select a time period for the operation from the drop-down menu.
3. Click **Schedule** to initiate metadata collection from the connected source.
4. The system will automatically execute the selected operation (**Crawl** or **Crawl & Profile**) at the scheduled time.

### Other Operations

The **Connectors** page provides a centralized view of all configured connectors, along with their health status.

#### Managing connectors includes:

* **Connectors Health**: Displays the current status of each connector using a **green** icon for active connections and a **red** icon for inactive connections, helping to monitor the connectivity with data sources.
* **Viewing**: Click the **Eye icon** next to the connector name to view connector details, including databases, tables, columns, and codes.

#### **Nine Dots Menu Options:**

To view, edit, validate, build lineage, configure, or delete connectors, click on the **Nine Dots** menu.

* **Edit Connector**: Update and revalidate the data source.
* **Validate Connector**: Check the connection's integrity.
* **Settings**: Modify connector settings.
  * **Crawler**: Configure data extraction.
  * **Profiler**: Customize data profiling rules and methods.
  * **Query Policies**: Define query execution rules based on roles.
  * **Access Instructions**: Include notes on how to access the data.
  * **Business Glossary Settings**: Manage term associations at the connector level.
  * **Others**: Configure notification recipients for metadata changes.
* **Build Lineage**: Automatically build data lineage using source code parsing.
* **Delete Connector**: Remove a connector with confirmation.

### **Crawler Rules**

There are also Crawler Rules that can be configured to provide more refined results. These rules are nothing more than regular expressions that are written against the source system.  A regular expression can be written to search for matching character sequences against patterns specified by regular expressions in the code. These options include&#x20;

* Include Schema Regex
* Exclude Schema Regex
* Include Table & View Regex
* Exclude Table & View Regex
* Include Column Regex
* Exclude Column Regex
* Include Procedure & Function Regex
* Exclude Procedure & Function Regex

When setting up the regex rules, you can write rules that will either include and/or exclude schema, tables, views, columns, procedures, and functions that start with, end with or have middle characters as defined in the rule.&#x20;

{% hint style="success" %}
***Note:** Non-Regex includes/excludes it is supported for starts with, ends with using %.*
{% endhint %}

**Example**: If you only want schemas that start with "a" then you need to write "a(.\*)" in the Include Schema Regex field. After that, they need to run the crawl job, then the results for the schemas that start only with the character "a” are displayed.

**Example:** If you only want schemas that end with "e," then you need to write "(.\*)e" in the Include Schema Regex field. After that, they need to run the crawl job, then the results for the schemas that end only with the character "e” are displayed.

**Example:** If you only want to exclude schemas that start with "a," then you need to write "a(.\*)" in the Exclude Schema Regex field. After that, they need to run the crawl job, then the results for the schemas without a character "a” at the start of the schema name are displayed.

**Example:** If you only want to exclude schemas that end with "e," then you need to write "(.\*)e" in the Exclude Schema Regex field. After that, they need to run the crawl job, then the results for the schemas without a character "e” at the end of the schema name are displayed.

{% hint style="success" %}
***NOTE:** Above examples are on the Table, Column, and Procedures & functions are executed the same way.*
{% endhint %}

### Profiler Settings

The process of gathering statistics and informative summaries about the connected data source(s). Statistics can help assess the data source's quality before using it in an analysis. Profiling a data source also helps identify relationships between the tables at the entity level and patterns between them. Profiling is always optional; crawling can be run without also profiling. However, if profiling is desired, there are a few options on how to profile the metadata/data.&#x20;

To profile a data source, navigate to the Administration > Crawler tab. To begin the Profiling process,

1. Select the Crawler ID in the Select column (this selects the desired data source).&#x20;
2. Click the Nine Dots button and select **the Settings** option to configure the desired profile settings. *A pop-up window will open, and the Admin user may select several options to tell OvalEdge how to profile the source system. By default, when a data source is profiled, all the rows in a dataset are analyzed to collect the statistics. You can configure the settings to increase the database's efficiency and performance (while the profile job is running).*&#x20;
3. A pop-up window of the profiler setting will be displayed  &#x20;
4. Click on the **Save Changes** button to save the profiler option.  &#x20;

#### **To configure the Profile Setting,**

Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.&#x20;

**The attributes are as follows:**

<table data-header-hidden><thead><tr><th width="220.25"></th><th></th></tr></thead><tbody><tr><td><strong>Columns</strong></td><td><strong>Description</strong></td></tr><tr><td><strong>Order</strong></td><td>Order is the sequence in which the profiling is done. </td></tr><tr><td><strong>Day</strong></td><td>The day of the week, profiling is set to run.</td></tr><tr><td><strong>Start/End Time</strong></td><td>The start and end time at which profiling is set to perform.</td></tr><tr><td><strong>Number of Threads</strong></td><td>A thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source. </td></tr><tr><td><strong>Profile Type</strong></td><td><p>There are four main types of data profiling. </p><ul><li><strong>Sample</strong> Profiling runs the profile on a given sample size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different compared to full profiles as we calculate them only on sample size. The sample profile is based on two main values. To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled).</li><li><strong>Auto</strong> and if the table row count is more than the Rowcount Limit configured in the setting. The sample size should always be less than Rowcount Limit when the Profile Type is Auto. Row Constraint checkbox is checked, then it is set as True. When the profile type is set as Auto, it first runs as a Query type, and later it transfers the type from Query to Sample if the conditions are not met. In Query mode, when you check Rowcount Constraint as TRUE in Query mode and if the table row count exceeds the Rowcount Limit, the profile type is automatically transferred from Query to Sample. </li><li>The <strong>Query</strong> profiling is when the Rowcount Constraint is checked as TRUE, and if the input table row count is less than the Rowcount Limit, then the profiling is executed on the entire table. If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues. A profile type set to “Query” will always depend on the Rowcount limit, and the row count constraint must be set as “True”. </li><li>Finally, there is the <strong>Disabled</strong> profile type, which prevents profiling on the selected data source. </li></ul></td></tr><tr><td><strong>Row Count Constraint</strong></td><td>When set to true, this enables the data rule profiling.</td></tr><tr><td><strong>Row Count Limit</strong></td><td>The number of rows of data to be profiled. </td></tr><tr><td><strong>Sample Profile Size</strong></td><td>The total number of rows to be included in profiling.</td></tr><tr><td><strong>Query Timeout</strong></td><td>The length of time in seconds to allow the query to run on a remote database before timing out. </td></tr></tbody></table>

### Access Instruction

It allows the Crawler admin to write the instructions and guide the user to crawl the data source.

* You can provide the instruction in **the Crawler > Setting page**&#x20;
* Click the **Access Instruction** tab
* Enter the instructions&#x20;
* Click the **Save Changes** button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy, like database/schemas.

### Other&#x20;

#### **User Notification**&#x20;

To set up the User Notifications for the data source, complete the following steps:&#x20;

1. In the **Crawler Information** page, select the specific Crawler ID for which the User Notification needs to be included and click on the Nine Dots button. The connectors options menu is displayed. &#x20;
2. Select the **Settings** options, and the Connectors Settings page is displayed.
3. Click on the **Others** tab, and the Send Metadata Changes Notifications To and Context URL sections are displayed.&#x20;
4. Select whether the notifications for the Data Owner and Data Steward under the specific Roles need to be sent or not.
5. Select the desired role from the Roles dropdown menu and click on the **Save Changes** button. The notifications will be successfully sent to select Data Owner and Data Steward.

### **Additional Information**

1. Athena restricts each account to 100 databases, and databases cannot include over 100 tables.
2. Athena DDL max query limit: 20 DDL active queries.
3. Amazon S3 bucket limit is 100 buckets per account by default – you can request to increase it up to 1,000 S3 buckets per account.

## **FAQs**

1. Is there a step-by-step way to upgrade to the AWS Data Catalog?

Yes. A step-by-step guide can be found [here](http://docs.aws.amazon.com/athena/latest/ug/glue-athena.html).

2. Can I run any Hive Query on Athena?

Amazon Athena uses Hive only for DDL (Data Definition Language) and for creation/modification and deletion of tables and/or partitions. Please [click here](http://docs.aws.amazon.com/athena/latest/ug/language-reference.html) for a complete list of statements supported. Athena uses Presto when you run SQL queries on Amazon S3. You can run ANSI-Compliant SQL SELECT statements to query your data in Amazon S3.

***

&#x20;Copyright © 2026, OvalEdge LLC, Peachtree Corners GA USA


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ovaledge.com/release8.1/connectors/connector-repositories/data-warehouse/amazon-athena.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
