Google Data Fusion

Google Data Fusion is a platform for building enterprise-level data integration and transformation solutions, used to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.

OvalEdge uses a JDBC driver to connect to the data source, which allows users to crawl and build lineage.

Connector Characteristics

Connector Category

ETL

Connectivity

JDBC

Connector Version

NA

Google Data Fusion Versions Supported

6.9.2

OvalEdge Releases Supported

Release 6.3

Prerequisites (Prepare Google Data Fusion Environment)

The following are the prerequisites required for establishing a connection:

  • Driver Details

JDBC driver is provided by default over the OvalEdge installation. In case it needs to be changed, add Google Data Fusion drivers into the OvalEdge Jar path (/home/ovaledge/jarpath) to communicate with the Google Data Fusion database.

Google Data Fusion User Account and Permissions

Create a Google Data Fusion account to connect to OvalEdge with the following permissions to be able to crawl metadata into OvalEdge. Required permissions and corresponding Google Data Fusion commands are given below for your reference:

Account

Required Roles

Data Proc Service Account

  • Cloud Data Fusion Runner

  • Dataproc Worker

  • Editor

  • Cloud Data Fusion API Service Agent

Data Fusion Service account

Service Account User

Operation

Minimum Access Permission

Connection Validation

Read

Crawl Database

Read

Crawl ETL Sourcecode

Read

Lineage

Read

  1. Once you open the Cloud Console, you need to click on the 3 lines from the top left, scroll down, and find Data Fusion under the Analytics section.

  2. Click on the Create Instance:

  3. Provide the required details as shown below i.e., Instance Name, Desc, Region, and Version. Then click on the Create button The instance will be created in 20 min.

  4. Once the instance is created, we can see the instance details by clicking on the instance:

We must grant the required IAM roles to the above-highlighted Data Proc Service Account.

  1. Now it's important to grant required IAM roles for the Default Compute Engine SA and Google Managed Data Fusion SA (SA: Service Account)

  2. By clicking on the Navigation Menu and click on the IAM & Admin.

  3. Here, we need to make sure the service account has below roles:

  4. Cloud Data Fusion Runner

  5. Dataproc Worker

  6. Editor

  7. Cloud Data Fusion API Service Agent

  8. If any of the above-mentioned roles are missing, then we need to click on the Edit Principal as shown below:

  9. Then, we need to click the edit icon and add the above-mentioned missing role.

  10. Now click on the Service accounts on the left menu as shown below:

  11. Now click on the service account to see more details.

  12. Click on the Permissions tab on the top and enable the checkbox then it will display all google managed service accounts as shown below:

  13. Now, we must search for the Data Fusion Google-managed service account. So search with “datafusion” and check whether the service account has a “Service Account User” role.

  14. If it doesn’t, then we need to click on the edit icon and add the “Service Account User” Permission.

Google Data Fusion-Specific Parameters

Fields

Details

Project Id

Instance Name

Region Code

Service Account Json Key file

Connector Settings

The following are the Google Data Fusion Connector settings:

  • Lineage

Limitations

S.No.

Description

1

Not Supported Components: NA

Errors & Resolution

S.No.

Error Message(s)

Description / Resolution

1

Failed to establish a connection. Please check the credentials.

Invalid credentials are provided or the user or role does not have access.

2

Connection Timeout

Invalid credentials are provided or the server is not running.

3

Errors while downloading the File.

403: Access denied [Provide appropriate access to user or role using in connection]

404: No such key [The object does not exist in the remote.]

4

Broken Pipeline

Due to heavy traffic. Hit the same API after some time to avoid errors.

5

Role based errors

Connector based errors

conn

FAQs (Connector-Specific)

Q1: Can I use the driver to access Google Data Fusion from a Linux system?

A: Yes! The driver can access Google Data Fusion from Linux, Unix, and other non-Windows platforms.

Was this helpful?