Amazon Athena

This article outlines the integration with the Amazon Athena connector, enabling streamlined metadata management through features such as crawling, profiling, querying, data preview, and lineage building (both automatic and manual).

This connector supports connectivity to Amazon Athena using the AWS SDK and enables metadata extraction for schemas, tables, columns, views, named queries, and prepared statements. It supports both IAM User Authentication and Role-Based Authentication, allowing access to Athena resources, AWS Glue Data Catalog metadata, and Amazon S3 query result locations required for crawling, profiling, and query execution.

Overview

Connector Details

Connector Category

RDBMS

OvalEdge Release Supported

Release6.3.4 and later

Connectivity

[How the connection is established with Amazon Athena]

AWS SDK

Verified Amazon Athena Version

Athena Engine v3

The Amazon Athena connector has been validated with the mentioned "Verified Amazon Athena Versions" and is expected to be compatible with other supported Amazon Athena versions. If there are any issues with validation or metadata crawling, please submit a support ticket for investigation and feedback.

Connector Features

Feature
Availability

Crawling

Delta Crawling

Profiling

Sample Profiling

Query Sheet

Data Preview

Auto Lineage

Manual Lineage

Secure Authentication via Credential Manager

Data Quality

DAM (Data Access Management)

Bridge

Metadata Mapping

The following objects are crawled from Amazon Athena and mapped to the corresponding UI assets.

Amazon Athena Object
Amazon Athena Attribute
OvalEdge Attribute
OvaEdge Category
OvalEdge Type

Schema

database.name

Schema

Schemas

schema

Schema

database.description

Source Description

Descriptions

Source Description

Table

table.name

Table

Tables

table

Table

table.tableType (TABLE/VIEW/EXTERNAL_TABLE)

Table Data Type

Tables

table

Table

table.parameters.location

Table Location

Tables

table

Table

- (Athena doesn’t carry comments via API)

Table Comments

Descriptions

Source Description

Columns

column_name

Column

Table Columns

-

Columns

data_type

Column Type

Table Columns

-

Columns

ordinal_position

Column Position

Table Columns

-

Columns

IS_NULLABLE (YES/NO)

Nullable

Table Columns

-

Columns

comment (if present; often empty in Athena)

Source Description

Table Columns

-

Views

table.name (where table Type = VIRTUAL_VIEW)

View

Tables

view

Views

SHOW CREATE VIEW result

View Query

Views

View

Named Queries

namedQuery.name

Name

Views

other

Named Queries

named Query.queryString

View/Query Text

Views

Other

Prepared Statements

preparedStatement.statement Name

Name

Views

other

Set up a Connection

Prerequisites

The following are the prerequisites to establish a connection:

External Supporting Files

The required external JAR files are included as part of the OvalEdge installation artifacts. For driver installation and configuration details, refer to the Connector Drivers Setup Guide. Please contact the OvalEdge Team for assistance related to the driver files and configuration setup.

File Name
Description

athena-2.30.2.jar

Use this file when connecting to Amazon Athena using the AWS SDK

Service Account User Permissions

👨‍💻 Who can provide these permissions? These permissions are typically granted by the Amazon Athena administrator, as users may not have the required access to assign them independently.

The IAM role/user (for example: ovaxxxge-bxxxge-xxx-xxx) must have appropriate Athena, S3, and Glue permissions.

  • An admin/service account for OvalEdge Data Catalog Operations.

Operation
Objects
AWS Athena System APIs / Objects
Access Permissions

Crawling

Schema (Databases)

athena:ListDatabases, glue:GetDatabases

athena:ListDatabases, glue:GetDatabase

Validation

S3 Bucket

s3:HeadBucket, s3:ListBucket, s3:HeadObject

s3:HeadBucket, s3:ListBucket, s3:HeadObject

Crawling

Tables

athena:ListTableMetadata, glue:GetTables

athena:ListTableMetadata, glue:GetTable

Crawling

Table Columns

athena:StartQueryExecution, athena:GetQueryResults, information_schema.columns

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Crawling & Lineage Building

Views

athena:StartQueryExecution, athena:GetQueryResults, SHOW CREATE VIEW

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Crawling & Lineage Building

External Tables

athena:ListTableMetadata (table parameters)

athena:ListTableMetadata

Crawling & Lineage Building

Named Queries

athena:ListNamedQueries, athena:GetNamedQuery, athena:ListWorkGroups

athena:ListNamedQueries, athena:GetNamedQuery, athena:ListWorkGroups

Crawling & Lineage Building

Prepared Statements

athena:ListPreparedStatements, athena:GetPreparedStatement, athena:ListWorkGroups

athena:ListPreparedStatements, athena:GetPreparedStatement, athena:ListWorkGroups

Profiling

Row Count

athena:StartQueryExecution, athena:GetQueryResults, SELECT COUNT(*)

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Profiling

Data Profiling – Top Values

athena:StartQueryExecution, athena:GetQueryResults, SELECT ... GROUP BY ... ORDER BY ... LIMIT

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Profiling

Sample Data

athena:StartQueryExecution, athena:GetQueryResults, SELECT * ... LIMIT

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Profiling

Non-Null Count

athena:StartQueryExecution, athena:GetQueryResults, SELECT COUNT(*) WHERE column IS NOT NULL

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Profiling

Max / Min / Distinct Count

athena:StartQueryExecution, athena:GetQueryResults, SELECT MAX(), MIN(), COUNT(DISTINCT)

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Data Access & Governance

Governed Data Query Execution

athena:StartQueryExecution, athena:GetQueryResults, SELECT ... WHERE ...

athena:StartQueryExecution, athena:GetQueryResults, s3:GetObject

Data Access & Query Execution

Data Query Execution (Async)

athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults

athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults, s3:GetObject

Data Access & Query Execution

Data Query Execution (Real-Time)

athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults

athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults, s3:GetObject

Connection Validation

Connection Validation

s3:ListBucket, s3:GetBucketLocation, athena:ListWorkGroups

s3:ListBucket, s3:GetBucketLocation, athena:ListWorkGroups

All Operations

S3 Output Location

s3:ListBucket, s3:GetBucketLocation, s3:GetObject

s3:ListBucket, s3:GetBucketLocation, s3:GetObject

Connection Configuration Steps

  1. Log into OvalEdge, go to Administration > Connectors, click + (New Connector), search for Amazon Athena, and complete the required parameters.

Note: Fields marked with an asterisk (*) are mandatory for establishing a connection.

Field Name
Description

Connector Type

By default, "Amazon Athena" is displayed as the selected connector type.

Authentication*

Select the authentication type from the drop-down.

  • Role based Authentication

  • IAM User Authentication

Field Name
Description

Credential Manager*

Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.

Supported Credential Managers:

  • OE Credential Manager

  • AWS Secrets Manager

  • HashiCorp Vault

  • Azure Key Vault

For more details, click here.

License Add Ons

Select the checkbox for the Auto Lineage Add-On to build data lineage automatically. For more details, click here.

Connector Name*

Enter a unique name for the connector.

Connector Description

Enter a brief description to describe the purpose of the connector.

Connector Environment

Select the environment (Example: PROD, STG) configured for the connector. For more details, click here.

Cross-Account Role ARN

Enter the ARN of the Role Based that allows access to the target account for establishing the connection.

Database Region*

Enter the AWS region where the Amazon Athena resources and associated S3 output location are configured (for example, xx-xxx-1).

Catalog Name*

Enter the name of the Data Catalog that contains the databases and tables to be crawled (default: AwsDataCatalog).

Output S3 Folder Path*

Enter the Amazon S3 folder path where Athena query results are stored (for example, s3://bucket-name/athena/results/). The configured account or role must have access to this location.

Default Governance Roles

Default Governance Roles*

Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.

Admin Roles

Admin Roles*

Select one or more users from the dropdown list for Integration Admin and Security & Governance Admin. All users configured in the security settings are available for selection.

Bridge

Select Bridge*

If applicable, select the bridge from the drop-down list.The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.

  1. After entering all connection details, the following actions can be performed:

    1. Click Validate to verify the connection.

    2. Click Save to store the connection for future use.

    3. Click Save & Configure to apply additional settings before saving.

  2. The saved connection will appear on the Connectors home page.

Manage Connector Operations

Crawl/Profile

The Crawl/Profile button allows users to select one or more schemas for crawling and profiling.

  1. Navigate to the Connectors page and click Crawl/Profile.

  2. Select the schemas to be crawled.

  3. The Crawl option is selected by default. To perform both operations, select the Crawl & Profile radio button.

  4. Click Run to collect metadata from the connected source and load it into the Data Catalog.

  5. After a successful crawl, the information appears in the Data Catalog > Databases tab.

The Schedule checkbox allows automated crawling and profiling at defined intervals, from a minute to a year.

  1. Click the Schedule checkbox to enable the Select Period drop-down.

  2. Select a time period for the operation from the drop-down menu.

  3. Click Schedule to initiate metadata collection from the connected source.

  4. The system will automatically execute the selected operation (Crawl or Crawl & Profile) at the scheduled time.

Other Operations

The Connectors page provides a centralized view of all configured connectors, along with their health status.

Managing connectors includes:

  • Connectors Health: Displays the current status of each connector using a green icon for active connections and a red icon for inactive connections, helping to monitor the connectivity with data sources.

  • Viewing: Click the Eye icon next to the connector name to view connector details.

Nine Dots Menu Options:

To view, edit, validate, build lineage, configure, or delete connectors, click on the Nine Dots menu.

  • Edit Connector: Update and revalidate the data source.

  • Validate Connector: Check the connection's integrity.

  • Settings: Modify connector settings.

    • Crawler: Configure data extraction.

    • Profiler: Customize data profiling rules and methods.

    • Query Policies: Define query execution rules based on roles.

    • Access Instructions: Include notes on how to access the data.

    • Business Glossary Settings: Manage term associations at the connector level.

    • Others: Configure notification recipients for metadata changes.

  • Build Lineage: Automatically build data lineage using source code parsing.

  • Delete Connector: Remove a connector with confirmation.

For more details on connector settings, click here.

Additional Information

  1. Athena restricts each account to 100 databases, and databases cannot include over 100 tables.

  2. Athena DDL max query limit: 20 DDL active queries.

  3. Amazon S3 bucket limit is 100 buckets per account by default – you can request to increase it up to 1,000 S3 buckets per account.

Connectivity Troubleshooting

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

S.No.
Error Message(s)
Error Description & Resolution

1

S3 bucket does not existInvalid S3 output location

Error Description: The configured Amazon S3 output location is invalid, inaccessible, or does not exist.

Resolution:

  • Verify the S3 path format is s3://bucket-name/path/.

  • Ensure the path ends with a forward slash (/).

  • Verify the S3 bucket exists and is accessible.

  • Ensure the configured credentials have s3:GetBucketLocation, s3:ListBucket, and s3:PutObject permissions.

  • Verify the S3 bucket region matches the Athena region.

  • Validate access using AWS CLI commands.

2

Invalid credentialsAccess Denied

Error Description: Authentication to Amazon Athena failed due to invalid credentials or insufficient permissions.

Resolution:

  • Verify that the Access Key and Secret Key are valid.

  • Verify the IAM Role ARN is correct when role-based authentication is used.

  • Ensure the IAM user or role has the required Athena, AWS Glue, and S3 permissions.

  • Confirm the credentials have not expired.

  • Validate access using AWS CLI commands.

3

Unable to assume cross-account role

Error Description: The specified cross-account IAM role cannot be assumed.

Resolution:

  • Verify the Role ARN format and account details.

  • Ensure the role trust policy allows role assumption.

  • Verify the role has the required Athena, AWS Glue, and S3 permissions.

  • Confirm AWS STS access is enabled in the target region.

  • Verify the session duration does not exceed the configured maximum.

4

Athena client initialization failed

Error Description: The Athena client could not be initialized due to configuration, credential, or connectivity issues.

Resolution:

  • Verify the configured AWS region is valid.

  • Confirm credentials are valid.

  • Verify network connectivity to AWS services.

  • Ensure all required AWS SDK dependencies are available.

  • Review application logs for detailed initialization errors.

5

Invalid region

Error Description: The configured AWS region is invalid or unsupported.

Resolution:

  • Verify the region follows the correct AWS format (for example, us-east-1).

  • Ensure Athena is supported in the selected region.

  • Verify the S3 output location resides in the same region.

  • Confirm the region is enabled in the AWS account.

6

Query execution timeout

Error Description: Query execution exceeded the configured timeout limits.

Resolution:

  • Optimize the query to reduce execution time.

  • Use filters or LIMIT clauses when appropriate.

  • Review Athena workgroup timeout settings.

  • Verify network latency and AWS service availability.

  • Retry the operation after validating query performance.

7

Query cancelledQuery failed

Error Description: Athena terminated the query due to execution failures, workgroup policies, syntax issues, or resource limitations.

Resolution:

  • Review the detailed error message returned by Athena.

  • Verify query syntax and object references.

  • Check data scan limits and workgroup restrictions.

  • Ensure the queried objects exist and are accessible.

  • Review Athena and OvalEdge logs for additional details.

8

No schemas returnedFailed to retrieve databases

Error Description: OvalEdge could not retrieve database metadata from the configured catalog.

Resolution:

  • Verify the Catalog Name is correct.

  • Ensure the catalog exists and is accessible.

  • Verify permissions for glue:GetDatabases and related Athena APIs.

  • Confirm the selected region contains the catalog metadata.

  • Review logs for API or permission-related errors.

9

Failed to retrieve tables

Error Description: Table metadata could not be retrieved from Amazon Athena.

Resolution:

  • Verify the database exists and is accessible.

  • Confirm the Catalog Name is configured correctly.

  • Ensure permissions for athena:ListTableMetadata and glue:GetTables are granted.

  • Review logs for metadata retrieval failures.

10

Failed to retrieve columns

Error Description: Column metadata could not be retrieved from Athena metadata tables.

Resolution:

  • Verify the database and table exist.

  • Ensure access to information_schema.columns.

  • Confirm the required query permissions are granted.

  • Review query execution logs for errors.

11

Failed to retrieve query results

Error Description: Query results could not be processed or returned successfully.

Resolution:

  • Verify the query completed successfully in Athena.

  • Ensure result-set metadata is available.

  • Confirm access to the configured S3 output location.

  • Review query execution logs and API responses for details.

FAQs

Is there a step-by-step way to upgrade to the AWS Data Catalog?

Yes. A step-by-step guide can be found here.

Can I run any Hive Query on Athena?

Amazon Athena uses Hive only for DDL (Data Definition Language) and for creation/modification and deletion of tables and/or partitions. Please click here for a complete list of statements supported. Athena uses Presto when you run SQL queries on Amazon S3. You can run ANSI-Compliant SQL SELECT statements to query your data in Amazon S3.

Some databases are missing after crawling. Why?

Amazon Athena retrieves databases using paginated API calls. If some databases are missing, verify IAM permissions, catalog accessibility, and review crawl logs for pagination-related messages.

Why are some external tables not crawled?

External tables must contain valid input format, output format, and SerDe definitions. Tables with incomplete definitions may be excluded during crawling.

Why is the nullable status of a column incorrect?

Nullable status is derived from the IS_NULLABLE attribute in Athena metadata. Verify the column definition in the AWS Glue Data Catalog.

How are large query results processed?

Query results are retrieved using paginated API requests until all result pages are processed.

Why does submitQuery() display a warning for DML or DDL statements?

The connector supports query execution for SELECT statements only. DML and DDL operations are not supported.

Why does a query remain in the RUNNING state for a long time?

Query status is monitored through periodic polling. Long-running queries may require optimization or workgroup configuration review.

Why does getRowCount() return 0?

The table may not contain data, or the query may not have returned any records. Verify the source table and query execution results.

Why are some columns skipped during profiling?

Unsupported data types and columns exceeding profiling limits are automatically excluded.

Why do profiling statistics appear incorrect?

Verify the column data type supports aggregation functions and review query execution logs.

Why are some columns excluded from sample profiling?

Columns belonging to unsupported data types configured in the profiling exclusion list are omitted.

Why are Top 50 Values not displayed?

Verify that the query executed successfully and that the column contains sufficient data values.

Why are null or invalid entries removed from generated JSON?

The connector applies validation and security filtering to remove invalid or potentially unsafe values.

Why are views not displayed after crawling?

View extraction is supported only for catalogs that support view definitions. Verify permissions and view availability.

Why does SHOW CREATE VIEW fail?

Ensure the view exists and that the configured user has permission to access the view definition.

Why are prepared statements not retrieved?

Verify workgroups exist and ensure permissions for athena:ListWorkGroups and athena:GetPreparedStatement.

Why are named queries not retrieved?

Verify permissions for athena:ListNamedQueries and athena:GetNamedQuery, and ensure named queries exist in the selected workgroups.

Why are workgroups not displayed?

Verify athena:ListWorkGroups permission and confirm the configured region is correct.

What causes "No result set metadata" errors?

The query result did not return the expected metadata. Verify successful query execution and review the API response.

Why does query result processing return unexpected values?

Verify column metadata, result-set structure, and source data consistency.

Why do governed data queries return no results?

Verify filter conditions, query syntax, and object accessibility.

Why do SUM, AVG, or STDDEV calculations fail?

These functions are supported only for numeric columns. Verify the selected column data type.

Why does Account ID retrieval fail?

Verify sts:GetCallerIdentity permission, credential validity, and AWS STS accessibility.

What should be checked when API calls frequently time out?

Review query complexity, network latency, AWS service availability, and timeout settings.

How can Athena connectivity be validated outside OvalEdge?

Use AWS CLI commands to validate Athena, AWS Glue, STS, and S3 access with the configured credentials or IAM role.

What permissions are required for successful crawling and profiling?

The configured IAM user or role must have the required Athena, AWS Glue, S3, and STS permissions described in the Service Account User Permissions section.


Copyright © 2026, OvalEdge LLC, Peachtree Corners GA USA

Last updated

Was this helpful?