Pentaho
An out-of-the-box connector is available for Pentaho. It supports crawling datasets, that is, Dataflows, Datasets, and lineage building.
OvalEdge supports five types of Pentaho Integration.
File Repository type
Server (API) type
Repository (Database extract) type
GitLab
GitLab RestAPI
To Crawl and Build Lineage, currently, OvalEdge is ready with File and Server type and Gitlab and Gitlab RestAPI.
To work with the File Repository type, you need to specify a path to the Pentaho server file repository where the Pentaho files are located.

User Permissions
The following are the minimum permissions required for OvalEdge to validate the Pentaho connection.
Permission: USAGE
Roles: Crawler Admin
Super User: OE ADMIN
If it's a file path, the user needs to access that folder.
If it's git lab, the user needs to have read access to the Pentaho files project.
Technical Specifications
Crawling
Feature
Supported Objects
Remarks
Crawling
Kept Pentaho Projects as Schema. Get the Job files and transformation files from the specified path.
Providing the files as datasets and source code
Lineage
Lineage entities
Details
Table-File Lineage
Supported
File - Table Lineage
Supported
Column Lineage- File Column Lineage
Supported
Connection Details
The following connection settings should be added for connecting to a Pentaho database:
log in to the OvalEdge application
Navigate to Administration > Crawler module.
click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed.
Select the connection type as PENTAHO. A pop-up window is displayed. If Crawl From is
The file system then needs to provide the File Path where Pentaho files are located.
2. GitLab or GitlabRestApi: needs to provide the following Details:
Gitlab username
GitLab password
Gitlab URL
3. The following are the field attributes required for the connection.
Property
Details
Connection Type
Pentaho
License Type
Standard, Lineage
Connection Name
Select a Connection name for the Pentaho database. You specify a reference name to identify your Pentaho database connection in OvalEdge easily. Example: Pentaho Connection1
Crawl from
1. File system(Need to provide the Path for Pentaho files)
2. Gitlab or GitlabRestApi(Need to provide the Gitlab Authentication Details)
GitLab Url
Database files URL (on-premises/cloud-based)
Path(if File System)
Path where the Pentaho files located
Context params
Path of the folder with filename contextparams.txt inside the folder. This is to add the dynamic values from the file.
Gitlab Username
User account login credential (only for Pentaho Authentication
Gitlab Password
Password (only for <Pentaho> Authentication)
4. Once connectivity is established, crawling is enabled.
5. Click on crawl and profile to get the project where the Pentaho files are located.
6. Select the required project or schema, then start crawling to get the Pentaho jobs and transformations.
How to Validate the Lineage
If you click on lineage, we will get all the job files from Pentaho(ObjectType= Job).
2. You need to select the required job to build the lineage for the selected Source code. If Lineage builds successfully, users get the lineage status as success lineageBuild in Lineage status.
3. Then check out the dataset to which lineage is built by clicking on the dataset name. You will get all the Job Steps in the associations of the selected Job.
4. If you click on the associated object, which is Transformation(Associated Object Type = Transformation), you will redirect to transformation, where the actual lineage is built. 5. So, if you click on the Associations tab of a transformation, you will see all the steps.
6. So click on any associated object, a table or file, and then click on lineage;
7. Lineage is displayed.
Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA
Was this helpful?

