Amazon Athena
Amazon Athena is a service that enables a data analyst to perform interactive queries in the AWS public cloud on data stored in Amazon Simple Storage Service (S3). In the OvalEdge application, the Amazon Athena connector allows you to crawl the database objects like Schemas, Tables, Table Columns, and Views, sample profile the data and build lineage.

Connector Capabilities
The connectivity to the S3 Connector is performed via the AWS S3 SDK.
Driver/API
Version
Details
Technical Specifications
Connector Capabilities
To crawl the Athena connector schemas, it is necessary to have permission to access information schema.
The connector capabilities are shown below:
Crawling
Supported objects and data types for Crawling are:
Supported Objects
Supported Data Types
Tables, Table Columns, Views, Functions, Scripts
All the datatypes from Athena
Profiling
Feature
Support
Table Profiling
Row count, Columns count, View sample data
View Profiling
Column Profiling
Min, Max, Null count, distinct, top 50 values
Full Profiling
Supported
Sample Profiling
Supported
Lineage Building
Feature
Support
Remarks
Table Lineage
Supported
-
Column Lineage
Supported
-
Querying
Operation
Details
Select
Supported
Insert
Not supported, by default.
Update
Not supported, by default.
Delete
Not supported, by default.
Joins within database
Supported
Joins outside database
Not supported
Aggregations
Supported
Group By
Supported
Order By
Supported
Note: By default, the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert/Update/Delete queries can be executed.
Connection Details
Pre-requisites
The minimum permissions required for OvalEdge to validate the Amazon Athena Database connection is the path that was given in the connection parameters should be a valid path and the user whoever is accessing the files should have access to that particular folder.
An admin/service account for OvalEdge Data Catalog Operations.
The minimum privileges required for a user are
Data Catalog Operations
Objects Impacted
Permissions
Crawling
Schema
LIST, GET permission on Database
Crawling
Table
LIST, GET permission on Table
To connect to the Amazon Athena database using the OvalEdge application, complete the following steps.
Login to the OvalEdge application
In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
Click on the Crawler sub-module name, and the Crawler Information page is displayed.
In the Crawler Information page, click on the + icon. The Manage Connection with Search Connector pop-up window is displayed.
In the Manage Connection pop-up window, select the connection type as Amazon Athena. The Manage Connection with Amazon Athena specific details pop-up window is displayed.
The following are the field attributes required for the connection of Amazon Athena.
Field Name
Mandatory/Optional
Description
Connection Type
Mandatory
By default the connection type is displayed as the Amazon Athena, if needed, the connection type can be changed by selecting desired connection type from the Connection Type dropdown, and based on the selection of the connection type, the fields associated with the selected connection type are displayed.
License Type
Mandatory
Select the license type used for the connection, the available option is Standard.
Connection Name
Mandatory
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Amazon Athena database connection in the OvalEdge application.
Example: Amazon Athena Connection
Access key
Mandatory
Access Key of an IAM user. This value is accessible from your AWS security credentials page.
Secret key
Mandatory
Secret Key of an IAM user. This value is accessible from your AWS security credentials page.
Database Region
Mandatory
Region of Amazon Athena
Output S3 Folder Path
Mandatory
Folder path in the S3 where the Athena query results get stored
Default Governance Roles
Mandatory
Select the required governance roles for the Steward, Custodian, and Owner
Select Bridge
Optional
The Bridge ID will be shown in the Bridge dropdown menu when bridges are configured and added, or it will be displayed as "NO BRIDGE".
Once after entering the connection details in the required fields, click on the validate button the entered connection details are validated the Save and Save & Configure buttons are enabled.
Click on the save button to establish the connection or the user can also directly click on the save & configure button to establish the connection and configure the connection settings. When you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector.
Note: The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Connection Settings
Once establishing the connection successfully the additional configurations for crawling need to be specified. To configure the Crawler settings for the Amazon Athena Connector, select the Amazon Athena Connection Name from the Crawler Information page and click on the 9 dots buttons and select the Settings options. The Connection Settings pop-up window is displayed.
The following are the crawler options you need to select for Amazon Athena:
The Setting page has various setting tabs for crawling, profiling, and remote data access options
Based on the connection selected, the options will differ. Not all crawler options will be available for the connection selected.
Settings
Options
Crawler Options
Tables, Views, and Columns: This crawling will discover the tables, views, and Columns and bring them into OvalEdge. This is the Default option for crawling.
Procedures, Functions & Views Source Code: The crawling will discover the procedures, functions, and views source code within the schemas selected and bring them into OvalEdge.
Profile Options
Tables and Columns
Views and Columns
Access Instruction
It allows the admin to write the instructions and guide the user to crawl the data source
Other
Send Metadata Change Notifications To and Context URL
Crawler Rules
There are also Crawler Rules that can be configured to provide more refined results. These rules are nothing more than regular expressions that are written against the source system. A regular expression can be written to search for matching character sequences against patterns specified by regular expressions in the code. These options include
Include Schema Regex
Exclude Schema Regex
Include Table & View Regex
Exclude Table & View Regex
Include Column Regex
Exclude Column Regex
Include Procedure & Function Regex
Exclude Procedure & Function Regex
When setting up the regex rules, you can write rules that will either include and/or exclude schema, tables, views, columns, procedures, and functions that start with, end with or have middle characters as defined in the rule.
Note: Non-Regex includes/exclude it is supported for starts with, ends with using %.
Example: If you only want schemas that start with "a" then you need to write "a(.*)" in the Include Schema Regex field. After that, they need to run the crawl job, then the results for the schemas that start only with the character "a” are displayed.
Example: If you only want schemas that end with "e," then you need to write "(.*)e" in the Include Schema Regex field. After that, they need to run the crawl job, then the results for the schemas that end only with the character "e” are displayed.
Example: If you only want to exclude schemas that start with "a," then you need to write "a(.*)" in the Exclude Schema Regex field. After that, they need to run the crawl job, then the results for the schemas without a character "a” at the starting of the schema name are displayed.
Example: If you only want to exclude schemas that end with "e," then you need to write "(.*)e" in the Exclude Schema Regex field. After that, they need to run the crawl job, then the results for the schemas without a character "e” at the end of the schema name are displayed.
NOTE: Above examples are on the Table, Column, and Procedures & functions are executed the same way.
Profiler Settings
The process of gathering statistics and informative summaries about the connected data source(s). Statistics can help assess the data source's quality before using it in an analysis. Profiling a data source also helps identify relationships between the tables at the entity level and patterns between them. Profiling is always optional; crawling can be run without also profiling. However, if profiling is desired, there are a few options on how to profile the metadata/data.
To profile a data source, navigate to the Administration > Crawler tab. To begin the Profiling process,
Select the Crawler ID in the Select column (this selects the desired data source).
Click the Nine Dots button and select the Setting option to configure the desired profile settings. A pop-up window will open, and the Admin user may select several options to tell OvalEdge how to profile the source system. By default, when a data source is profiled, all the rows in a dataset are analyzed to collect the statistics. You can configure the settings to increase the database's efficiency and performance (while the profile job is running).
A pop-up window of the profiler setting will be displayed
Click on the Save Changes button to save the profiler option.
To configure the Profile Setting,
Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.
The attributes are as follows:
Columns
Description
Order
Order is the sequence in which the profiling is done.
Day
The day of the week, profiling is set to run.
Start/End Time
The start and end time at which profiling is set to perform.
Number of Threads
A thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source.
Profile Type
There are four main types of data profiling.
Sample Profiling runs the profile on a given sample size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different compared to full profiles as we calculate them only on sample size. The sample profile is based on two main values. To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled).
Auto and if the table row count is more than the Rowcount Limit configured in the setting. The sample size should always be less than Rowcount Limit when the Profile Type is Auto. Row Constraint checkbox is checked, then it is set as True. When the profile type is set as Auto, it first runs as a Query type, and later it transfers the type from Query to Sample if the conditions are not met. In Query mode, when you check Rowcount Constraint as TRUE in Query mode and if the table row count exceeds the Rowcount Limit, the profile type is automatically transferred from Query to Sample.
The Query profiling is when the Rowcount Constraint is checked as TRUE, and if the input table row count is less than the Rowcount Limit, then the profiling is executed on the entire table. If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues. A profile type set to “Query” will always depend on the Rowcount limit, and the row count constraint must be set as “True”.
Finally, there is the Disabled profile type, which prevents profiling on the selected data source.
Row Count Constraint
When set to true, this enables the data rule profiling.
Row Count Limit
The number of rows of data to be profiled.
Sample Profile Size
The total number of rows to be included in profiling.
Query Timeout
The length of time in seconds to allow the query to run on a remote database before timing out.
Access Instruction
It allows the Crawler admin to write the instructions and guide the user to crawl the data source.
You can provide the instruction in Crawler > Setting page
Click the Access Instruction tab
Enter the instructions
Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like database/schemas.
Other
User Notification
To set up the User Notifications for the data source, complete the following steps:
In the Crawler Information page, select the specific Crawler ID for which the User Notification needs to be included and click on the Nine Dots button. The connectors options menu is displayed.
Select the Settings options, and the connectors Setting page is displayed.
Click on the Others tab, and the Send Metadata Changes Notifications To, and Context URL sections are displayed.
Select whether the notifications for the Data Owner and Data Steward under the specific Roles need to be sent or not.
Select the desired role from the Roles dropdown menu and click on the Save Changes button. The notifications will be successfully sent to select Data Owner and Data Steward.
Additional Information
Athena restricts each account to 100 databases, and databases cannot include over 100 tables.
Athena DDL max query limit: 20 DDL active queries .
Amazon S3 bucket limit is 100 buckets per account by default – you can request to increase it up to 1,000 S3 buckets per account.
FAQs
Is there a step-by-step way to upgrade to the AWS Data Catalog?
Yes. Step-by-Step guide can be found here.
Can I run any Hive Query on Athena?
Amazon Athena uses Hive only for DDL (Data Definition Language) and for creation/modification and deletion of tables and/or partitions. Please click here for a complete list of statements supported. Athena uses Presto when you run SQL queries on Amazon S3. You can run ANSI-Compliant SQL SELECT statements to query your data in Amazon S3.
Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
Was this helpful?

