# Amazon S3

This document outlines the integration with the Amazon S3 connector, enabling streamlined metadata management through features such as crawling and data preview. Additionally, it ensures secure authentication via Credential Manager.

<figure><img src="https://content.gitbook.com/content/ztcvwwOJCeaE1n6oHp4C/blobs/RdfIX9dvHCjYmb5v08D0/image.png" alt=""><figcaption></figcaption></figure>

## **Overview**

### **Connector Details**

| Connector Category                                                       | Cloud Storage |
| ------------------------------------------------------------------------ | ------------- |
| OvalEdge Release Current Connector Version                               | 6.3.4         |
| <p>Connectivity</p><p><em>\[How OvalEdge connects to Amazon S3]</em></p> | AWS S3 SDK    |
| <p>OvalEdge Releases Supported</p><p>(Available from)</p>                | Release4.0    |

### **Connector Features**

| Feature                                      | Availability |
| -------------------------------------------- | :----------: |
| Crawling / Cataloging                        |       ✅      |
| Delta Crawling                               |       ❌      |
| Profiling                                    |       ✅      |
| Query Sheet                                  |      NA      |
| Data Preview                                 |       ✅      |
| Auto Lineage                                 |      NA      |
| Manual Lineage                               |       ✅      |
| Secure Authentication via Credential Manager |       ✅      |
| Data Quality                                 |       ✅      |
| DAM (Data Access Management)                 |       ✅      |
| Bridge                                       |       ✅      |

### **Metadata Mapping**

The following objects are crawled from Amazon S3 and mapped to the corresponding UI assets.

<table><thead><tr><th width="170.66668701171875">Amazon S3 Object</th><th width="182.8333740234375">Amazon S3 Attribute</th><th width="182.6666259765625">OvalEdge Attribute</th><th width="180.5">OvalEdge Category</th><th width="177.1666259765625">OvalEdge Type</th></tr></thead><tbody><tr><td>Bucket</td><td>Bucket</td><td>Bucket</td><td>Bucket</td><td>Bucket</td></tr><tr><td>Folder</td><td>Folder</td><td>Folder</td><td>Folder</td><td>Folder</td></tr><tr><td>File</td><td>File</td><td>File</td><td>File</td><td>File</td></tr><tr><td>XLSX</td><td>File</td><td>File</td><td>File</td><td>XLSX</td></tr><tr><td>XLS</td><td>File</td><td>File</td><td>File</td><td>XLS</td></tr><tr><td>CSV</td><td>File</td><td>File</td><td>File</td><td>CSV</td></tr><tr><td>TXT</td><td>File</td><td>File</td><td>File</td><td>TXT</td></tr><tr><td>PARQUET</td><td>File</td><td>File</td><td>File</td><td>PARQUET</td></tr><tr><td>ORC</td><td>File</td><td>File</td><td>File</td><td>ORC</td></tr><tr><td>JSON</td><td>File</td><td>File</td><td>File</td><td>JSON</td></tr><tr><td>YAML</td><td>File</td><td>File</td><td>File</td><td>YAML</td></tr></tbody></table>

## **Set up a Connection**&#x20;

### **Prerequisites**

The following are the prerequisites to establish a connection:

#### **Service Account User Permissions**

{% hint style="warning" %}
It is recommended to use a separate service account to establish the connection to the data source, configured with the following minimum set of permissions.
{% endhint %}

{% hint style="info" %}
👨‍💻**Who can provide these permissions?** These permissions are typically granted by the Amazon S3 administrator, as users may not have the required access to assign them independently.
{% endhint %}

<table><thead><tr><th width="247.3333740234375">Objects</th><th>Access Permission</th></tr></thead><tbody><tr><td>Buckets</td><td><p>ListAllMyBuckets</p><p>GetBucketLocation</p><p>GetBucketTagging</p><p>GetEncryptionConfiguration</p></td></tr><tr><td>Folder</td><td><p>ListBucket</p><p>GetBucketLocation</p><p>GetEncryptionConfiguration</p></td></tr><tr><td>Files</td><td><p>ListBucket</p><p>GetBucketLocation</p><p>GetEncryptionConfiguration</p></td></tr><tr><td>Profile</td><td>GetObject</td></tr></tbody></table>

### **Connection Configuration Steps**

{% hint style="warning" %}
Users are required to have the Connector Creator role in order to configure a new connection.
{% endhint %}

1. Log into **OvalEdge**, go to **Administration > Connectors**, click **+ (New Connector)**, search for **Amazon S3**, and complete the required parameters.

{% hint style="info" %}
Fields marked with an asterisk (**\***) are mandatory for establishing a connection.
{% endhint %}

<table><thead><tr><th width="219">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Connector Type</td><td>By default, "Amazon S3" is displayed as the selected connector type.</td></tr><tr><td>Authentication<strong>*</strong></td><td><p>The following two types of authentication are supported for Amazon S3:</p><ul><li>Role Based Authentication (Default)</li><li>IAM User Authentication</li></ul></td></tr></tbody></table>

{% tabs %}
{% tab title="Role Based Authentication" %}

<table><thead><tr><th width="209.8333740234375">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td><p> </p><ul><li>Select the checkbox for <strong>Data Quality Add-On</strong> to identify data quality issues using data quality rules and anomaly detection.</li><li>Select the checkbox for <strong>Data Access Add-On</strong> to enable the data access functionality.</li></ul></td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Amazon S3 connection              </p><p>(Example: "AmazonS3db").</p></td></tr><tr><td>Connector Description</td><td>Enter a brief summary or details about the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment (Example: PROD, STG) configured for the connector.</td></tr><tr><td>Cross-Account Role ARN</td><td>Enter the ARN (Amazon Resource Name) of the role used for cross-account access.</td></tr><tr><td>Filter by tags</td><td>Enter one or more tags to narrow down and display only the items associated with those tags.</td></tr><tr><td>Region</td><td>Enter the region where the Amazon S3 files or resources are located.</td></tr></tbody></table>
{% endtab %}

{% tab title="IAM User Authentication" %}

<table><thead><tr><th width="209.8333740234375">Field Name</th><th>Description</th></tr></thead><tbody><tr><td>Credential Manager*</td><td><p>Select the desired credentials manager from the drop-down list. Relevant parameters will be displayed based on the selection.</p><p>Supported Credential Managers:</p><ul><li>OE Credential Manager</li><li>AWS Secrets Manager</li><li>HashiCorp</li><li>Azure Key Vault</li></ul></td></tr><tr><td>License Add Ons</td><td><p> </p><ul><li>Select the checkbox for Data Quality Add-On to identify data quality issues using data quality rules and anomaly detection.</li><li>Select the checkbox for Data Access Add-On to enable the data access functionality.</li></ul></td></tr><tr><td>Auto Lineage</td><td>Not Supported</td></tr><tr><td>Data Quality</td><td>Supported</td></tr><tr><td>Data Access</td><td>Supported</td></tr><tr><td>Connector Name*</td><td><p>Enter a unique name for the Amazon S3 connection              </p><p>(Example: "AmazonS3db").</p></td></tr><tr><td>Connector Description</td><td>Enter a brief summary or details about the connector.</td></tr><tr><td>Connector Environment</td><td>Select the environment (Example: PROD, STG) configured for the connector.</td></tr><tr><td>Access key*</td><td>Enter the AWS Access Key ID used to authenticate the IAM user.</td></tr><tr><td>Secret key*</td><td>Enter the AWS Secret Access Key associated with the Access Key ID.</td></tr><tr><td>Filter by tags</td><td>Enter one or more tags to narrow down and display only the items associated with those tags.</td></tr><tr><td>Region</td><td>Enter the region where the Amazon S3 files or resources are located.</td></tr></tbody></table>
{% endtab %}
{% endtabs %}

**Default Governance Roles**

<table data-header-hidden><thead><tr><th width="219.8333740234375"></th><th></th></tr></thead><tbody><tr><td>Default Governance Roles<strong>*</strong></td><td>Select the appropriate users or teams for each governance role from the drop-down list. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**Admin Roles**

<table data-header-hidden><thead><tr><th width="219.8333740234375"></th><th></th></tr></thead><tbody><tr><td>Admin Roles<strong>*</strong></td><td>Select one or more users from the dropdown list for Integration Admin and Security &#x26; Governance Admin. All users configured in the security settings are available for selection.</td></tr></tbody></table>

**No of Archive Objects**

<table data-header-hidden><thead><tr><th width="219.83331298828125"></th><th></th></tr></thead><tbody><tr><td>No Of Archive Objects<strong>*</strong></td><td><p>This shows the number of recent metadata changes to a dataset at the source. By default, it is off. To enable it, toggle the Archive button and specify the number of objects to archive.</p><p><strong>Example</strong>: Setting it to 4 retrieves the last four changes, displayed in the 'Version' column of the 'Metadata Changes' module.</p></td></tr></tbody></table>

**Bridge**

<table data-header-hidden><thead><tr><th width="219.8333740234375"></th><th></th></tr></thead><tbody><tr><td>Select Bridge<strong>*</strong></td><td><p>If applicable, select the bridge from the drop-down list.</p><p>The drop-down list displays all active bridges that have been configured. These bridges facilitate communication between data sources and the system without requiring changes to firewall rules.</p></td></tr></tbody></table>

2. After entering all connection details, the following actions can be performed:
   1. Click **Validate** to verify the connection.
   2. Click **Save** to store the connection for future use.
   3. Click **Save & Configure** to apply additional settings before saving.
3. The saved connection will appear on the Connectors home page.

## **Manage Connector Operations**

### **Crawl/Profile**

{% hint style="info" %}
To perform crawl and profile operations, users must be assigned the Integration Admin role.
{% endhint %}

The **Crawl/Profile** button allows users to select one or more schemas for crawling and profiling.&#x20;

1. Navigate to the Connectors page and click **Crawl/Profile.**
2. Select the schemas to be crawled.
3. The **Crawl** option is selected by default. To perform both operations, select the **Crawl & Profile** radio button.
4. Click **Run** to collect metadata from the connected source and load it into the **Data Catalog**.
5. After a successful crawl, the information appears in the **Data Catalog** > **Databases/Files/File Columns** tab.

The **Schedule** checkbox allows automated crawling and profiling at defined intervals, from a minute to a year.

1. Click the **Schedule** checkbox to enable the **Select Period** drop-down.
2. Select a time interval for the operation from the drop-down menu.
3. Click **Schedule** to initiate metadata collection from the connected source.
4. The system will automatically execute the selected operation (**Crawl** or **Crawl & Profile**) at the scheduled time.

#### **Other Operations**

The **Connectors** page provides a centralized view of all configured connectors, along with their health status.

**Managing connectors includes:**

* **Connectors Health**: Displays the current status of each connector using a **green** icon for active connections and a **red** icon for inactive connections, helping to monitor the connectivity with data sources.
* **Viewing**: Click the **Eye icon** next to the connector name to view connector details, including databases, tables, columns, and codes.

**Nine Dots Menu Options**:

To view, edit, validate, build lineage, configure, or delete connectors, click on the **Nine Dots** menu.

* **Edit Connector**: Update and revalidate the data source.
* **Validate Connector**: Check the connection's integrity.
* **Settings**: Modify connector settings.
  * **Crawler**: Configure data extraction.
  * **Access Instructions**: Add notes on how data can be accessed.
  * **Business Glossary Settings**: Manage term associations at the connector level.
  * **Anomaly Detection Settings**: Configure anomaly detection preferences at the connector level.
  * **Others**: Configure notification recipients for metadata changes.
* **Delete Connector:** Remove a connector with confirmation.

## **Connectivity Troubleshooting**

If incorrect parameters are entered, error messages may appear. Ensure all inputs are accurate to resolve these issues. If issues persist, contact the assigned support team.

<table><thead><tr><th width="64.83331298828125">S.No.</th><th width="422.5">Error Message(s)</th><th>Error Description/Resolution</th></tr></thead><tbody><tr><td>1</td><td>Error while validating connection: Please provide valid credentials: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 73GVA0Y9H15Q5K7G; S3 Extended Request ID: jmNMT5vyMU9kEiT68EgfY6IYRwTdvzSh+51qL/6IzxpguBCYe7e1JOJYLpbHOl1t2mqyKlmArTw=; Proxy: null)</td><td><p><strong>Error Description:</strong> Invalid Access Key</p><p><strong>Resolution:</strong> Provide a valid access key</p></td></tr><tr><td>2</td><td>Error while validating connection: Please provide valid credentials: The request signature we calculated does not match the signature you provided. Check your key and signing method. If you start to see this issue after you upgrade the SDK to 1.12.460 or later, it could be because the bucket provided contains '/'. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: NWGSQ9BDSZ2A3H5H; S3 Extended Request ID: 319yH7h/x76swRiPpjxjs8KB/6dLrdGHrrAJs9rD2/HgQWudiMCQJMzj1ItUQAJ1zEsVm/YsCbU=; Proxy: null)</td><td><p><strong>Error Description:</strong> Invalid Secret Key</p><p><strong>Resolution:</strong> Provide a valid secret key</p></td></tr><tr><td>3</td><td>Error while validating connection: Exception while fetching AWSCredentialsProvider : User: arn:aws:iam::479930578883:user/connector_testing is not authorized to perform: sts: AssumeRole on resource: arn:aws:iam::479930578883:role/airflow_MWAA (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: 6bd3e40e-6e9c-43e9-8f51-e631727b6afe; Proxy: null)</td><td><p><strong>Error Description:</strong> if AssumeRole Permission is missing for cross-role authentication</p><p><strong>Resolution:</strong> Create a policy with AssumeRole permission and assign it to the respective authentication role.</p></td></tr><tr><td>4</td><td>Error while validating connection: Incorrect Account ID!</td><td><p><strong>Error Description:</strong> Invalid account ID</p><p><strong>Resolution:</strong> Provide a valid account ID</p></td></tr></tbody></table>

***

Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
