Google Data Fusion
Google Data Fusion is a platform for building enterprise-level data integration and transformation solutions, used to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.
OvalEdge uses a JDBC driver to connect to the data source, which allows users to crawl and build lineage.
Connector Characteristics
Connector Category
ETL
Connectivity
JDBC
Connector Version
NA
Google Data Fusion Versions Supported
6.9.2
OvalEdge Releases Supported
Release 6.3
Prerequisites (Prepare Google Data Fusion Environment)
The following are the prerequisites required for establishing a connection:
Driver Details
JDBC driver is provided by default over the OvalEdge installation. In case it needs to be changed, add Google Data Fusion drivers into the OvalEdge Jar path (/home/ovaledge/jarpath) to communicate with the Google Data Fusion database.
Driver
Version
Details
Google Data Fusion User Account and Permissions
Create a Google Data Fusion account to connect to OvalEdge with the following permissions to be able to crawl metadata into OvalEdge. Required permissions and corresponding Google Data Fusion commands are given below for your reference:
Account
Required Roles
Data Proc Service Account
Cloud Data Fusion Runner
Dataproc Worker
Editor
Cloud Data Fusion API Service Agent
Data Fusion Service account
Service Account User
Operation
Minimum Access Permission
Connection Validation
Read
Crawl Database
Read
Crawl ETL Sourcecode
Read
Lineage
Read
Once you open the Cloud Console, you need to click on the 3 lines from the top left, scroll down, and find Data Fusion under the Analytics section.
Click on the Create Instance:
Provide the required details as shown below i.e., Instance Name, Desc, Region, and Version. Then click on the Create button The instance will be created in 20 min.
Once the instance is created, we can see the instance details by clicking on the instance:
We must grant the required IAM roles to the above-highlighted Data Proc Service Account.
Now it's important to grant required IAM roles for the Default Compute Engine SA and Google Managed Data Fusion SA (SA: Service Account)
By clicking on the Navigation Menu and click on the IAM & Admin.
Here, we need to make sure the service account has below roles:
Cloud Data Fusion Runner
Dataproc Worker
Editor
Cloud Data Fusion API Service Agent
If any of the above-mentioned roles are missing, then we need to click on the Edit Principal as shown below:
Then, we need to click the edit icon and add the above-mentioned missing role.
Now click on the Service accounts on the left menu as shown below:
Now click on the service account to see more details.
Click on the Permissions tab on the top and enable the checkbox then it will display all google managed service accounts as shown below:
Now, we must search for the Data Fusion Google-managed service account. So search with “datafusion” and check whether the service account has a “Service Account User” role.
If it doesn’t, then we need to click on the edit icon and add the “Service Account User” Permission.
Google Data Fusion-Specific Parameters
Fields
Details
Project Id
Instance Name
Region Code
Service Account Json Key file
Connector Settings
The following are the Google Data Fusion Connector settings:
Lineage
Limitations
S.No.
Description
1
Not Supported Components: NA
Errors & Resolution
S.No.
Error Message(s)
Description / Resolution
1
Failed to establish a connection. Please check the credentials.
Invalid credentials are provided or the user or role does not have access.
2
Connection Timeout
Invalid credentials are provided or the server is not running.
3
Errors while downloading the File.
403: Access denied [Provide appropriate access to user or role using in connection]
404: No such key [The object does not exist in the remote.]
4
Broken Pipeline
Due to heavy traffic. Hit the same API after some time to avoid errors.
5
Role based errors
Connector based errors
conn
FAQs (Connector-Specific)
Q1: Can I use the driver to access Google Data Fusion from a Linux system?
A: Yes! The driver can access Google Data Fusion from Linux, Unix, and other non-Windows platforms.
Was this helpful?

