Connector Settings

Connector settings determine how metadata is collected and displayed after a data source is crawled. These configurations play a critical role in ensuring only relevant metadata is captured, improving visibility within the Data Catalog, and enabling more effective data governance.

For instance, selecting metadata types such as Tables, Views, and Columns will limit the crawling process to these specific entities, even if additional objects like Functions and Procedures exist in the source.

Connector settings can vary depending on the type of connection. The commonly available connector setting categories are listed below:

Crawler
Profiler
Query Policies
Access Instruction
Business Glossary Settings
Anomaly Detection Settings
Lineage
Others

Access Connector Settings

The system displays connector settings after successfully saving the connector with or without validating it. To access and configure these settings:

Navigate to the Connectors page.
Select the connector using the radio button.
Click the 9-dot icon to open the settings menu.
Select Settings to open the Connector Settings page.

After opening the Connector Settings page, configure the required options based on specific metadata needs. Once the configurations are complete:

Click Save Changes to apply the updated settings.

During the next crawl, the system will collect metadata according to these configurations.

For example, selecting Reports in the Crawler settings ensures that only report-related metadata is captured and displayed in the Data Catalog.

All connector settings are described in the following section:

Crawler settings

One of the key settings is the Crawler, which gathers metadata from the source. Once a connector connects to a data source, it captures and catalogs various metadata elements such as schema names, table names, column details, views, procedures, relationships, reports, and storage buckets or folders.

Crawler Components

The Crawler consists of the following settings:

Crawler Options
Crawler Rules
Query Log Settings

Crawler Options

Crawler Options specify which metadata should be fetched from the data source and displayed in the Data Catalog. Based on the selected checkboxes, which may vary by connector, the corresponding metadata will be cataloged within the application.

Procedures, Functions, Triggers & Views Source Code option must be selected to build the lineage automatically.

Crawler Rules

Crawler Rules use regular expressions (regex) to filter metadata during crawling. These expressions match patterns based on naming conventions or character sequences to include or exclude specific data.

Crawler Logic Rules

Crawler Logic Rules define which metadata elements to include or exclude during the crawling process. These elements can be schemas, tables, views, columns, procedures, or functions, based on specified patterns.

After configuring the rules, save the changes and click the Crawl/Profile button. The matching results appear in the Schema Name (Advanced) pop-up window during Crawling and Profiling, based on the applied regular expressions (regex).

The logical pattern for each rule is as follows:

There are two formats for Regex patterns:

Format-1:

Any characters: [(.*)]
Any object name contains ‘customer’: [.*customer.*]
Any object name contains ‘customer’ or ‘prospect’: [.*customer.*|.*prospect.*]
Any object name starting with ‘customer’: [customer.*]
Any object name ending with ‘customer’: [.*customer]

Format-2:

In this format, patterns are separated by commas.

Supported patterns are start with, end with, and equals

Except for equals, all other patterns are case-sensitive.

Any object name starts with ‘customer’: customer%
Any object name ends with ‘customer’: %customer

Examples for Format-2:

Lineage, lineagecolumn: considers both object names lineage and lineagecolumn.
Prospect, customer%: considers the object names prospect and those start with the customer.
oetable, oecolumn, %customer: considers the object names oetable, oecolumn, and those end with customer.

Let’s look at an example of advanced regex: Objective:

Write a regex that can be used to either include or exclude any Database Schemas, Tables, Columns, Views, Procedures, or Functions that have "BKP" or "bkp" or any number anywhere in the string.

Building Regex: Here is an advanced regex [.*(BKP|bkp).*|.*\\d+.*], and let's break it down to understand it better. The regex is made up of different characters and symbols that have specific meanings.

The expression starts with a "[", which denotes the start of a character class. The "." character matches any character string, while the "(" and ")" characters match open and closed parentheses, respectively.

The "|" symbol is an alternation operator that matches the previous or next parts. The expression "BKP|bkp" matches either "BKP" or "bkp". The "]" character marks the end of the character class.

The "*" symbol is a quantifier that matches zero or more instances of the preceding character or group of characters. The expression ".*\\d+.*" matches any character followed by one or more digits.

When setting up the regex rules, the system enables writing rules to include or exclude tables, views, columns, procedures, and functions based on specified criteria such as starting with, ending with, or containing specific characters.

Include Schema Regex

"a(.*)"

To target only schemas starting with "a", enter "a(.*)" into the Include Schema Regex field. So, a schema that begins with the letter "a" is considered in this case.

Exclude Schema Regex

"(.*)a"

To target only schemas ending with "a", enter "a(.*)" into the Exclude Schema Regex field. In this case, the schemas that end with the character "e” are excluded, and the remaining schemas are considered.

Include Table and View Regex

"e(.*)"

To target a Table starting with "e", enter "(.*)e" into the “Include Table Regex field”. In this case, the Tables and views that start with the character "e” are considered.

Exclude Table and View Regex

"(.*)e"

To target a table ending with "e", enter "(.*)e" into the Exclude Table Regex field.

In this case, the tables and views that end with the character "e” are excluded, and the remaining tables are considered.

Include Column Regex

"a(.*)"

To target a Column starting with "a", enter "a(.*)" into the Include Column Regex field. In this case, the columns that start with the character "a” are considered.

Exclude Column Regex

"n(.*)"

To target a column ending with "n", enter "n(.*)" into the Exclude Column Regex field. In this case, the columns that end with the character "n” are excluded and the remaining column details are considered.

Include Procedure & Function Regex:

"s(.*)"

To target Procedure & Function starting with "s", enter "s(.*)" into the Include Procedure & Function Regex field. In this case, the columns that start with the character "s” are considered.

Exclude Procedure & Function Regex:

"s(.*)"

To target a Procedure & Function ending with "s", enter "s(.*)" into the Procedure & Function Regex field. In this case, the columns that start with the character "s” are considered.

Non-Regex includes/excludes is supported for “startswith” and “endswith” using %.

Query Log Settings

Data crawling of queries is supported through two methods: using the crawler or the query log. Queries are essential for building data lineage and relationships.

Query Log Settings allow selection of the query type and lookback period. The query text is predefined based on the connector, but can be customized if needed. It is also possible to specify the schema for retrieving queries and exclude certain users. The purpose of these settings is to fetch queries from the data source, process them, and create data lineage.

Various options to configure query log settings include:

Query Log Settings

Description

Query

Displays the queries used to retrieve the Query Logs from the schema.

Look Back Period

This feature allows the user to fetch queries processed in the data source for several days ago.

For example, the look-back period can be set to 1 day, two days, or more. The maximum look-back period is 7 days.

Include Query Type

Include query types such as Select, Insert, Update, and Delete.

Exclude Users

In the Query Logs, there is an option to exclude unnecessary users. If a username is specified in the excluded user field, the query log for queries executed by that user is excluded from retrieval.

Schema

There is an option to select a specific schema to apply query log settings. If this is left blank, it will fetch the query log based on all the schema available for that database connection.

Profile settings

Profiling collects statistics and summaries from connected data sources to assess data quality before analysis. It identifies relationships among tables at the entity level and detects patterns between them. Crawling the data source is required before initiating profiling.

The profiling process includes the following components:

Profile Options
Profile Rules
Profile Settings

Profile Options

Profile options enable users to select the type of metadata to profile.

By default, for any RDBMS, the checkboxes for Tables and Columns, Views and Columns are selected. Selecting these options enables profiling of tables, views, and their respective columns from the data source. These options may vary dynamically based on the connector.

Profile Rules

In Profiling, regular expressions (regex) are used to refine results by matching patterns against data in the source system. These expressions can identify data that starts with, ends with, or contains specific letters or numbers, allowing targeted filtering of relevant information.

There are two types of profiler rules: Include and Exclude.

Include Regex is used to specify table, view, and column names to be included in profiling based on defined patterns.
Exclude Regex is used to define names to be excluded from profiling. Profile Rules function similarly to Crawler Rules.

Profile Settings

Profiler Settings provide options to schedule the profiling of data sources with advanced configuration capabilities as outlined below:

Parameters

Description

Order

When multiple profiling jobs are scheduled, execution follows the defined order number. For example, a job with Order 2 will run after the job with Order 1.

Day

Specifies the day of the week when profiling is allowed to run. Profiling actions (manual or scheduled) are restricted to the selected day. The system will not permit profiling on any other day.

Start/End Time

The start and end time that profiling is set to perform.

Number of Threads

Each thread represents one query executing on the database. The number of threads determines how many queries can be run simultaneously on the data source.

Profile Type

There are four types of data profiling.

Sample: The profile is generated using a designated Sample Profile Size. It calculates statistics such as Min, Max, Distinct, and Null Counts for each column. However, the results may differ from the full profiles since it is only calculated on a sample size.

To perform this type, set the profile type to Sample and define the number of records under Sample Profile Size.

Auto: When the Row Constraint is set to True, and the Table Row Count exceeds the configured Row Count Limit, profiling is performed on the number of rows specified in the Sample Profile Size.

If the Table Row Count is below the Row Count Limit, all rows in the table are profiled.

When the Profile Type is set to Auto, it always relies on the Row Count Limit when the Row Constraint is True.

Query: When the Table Row Count is below the Row Count Limit, profiling is performed on the entire table. If the Table Row Count exceeds the Row Count Limit, profiling is skipped to avoid performance issues.
Disabled: Profiling is completely disabled for the selected data source.

Here is a detailed explanation with an example for each Profile type:

Table Name: Sales_Data

Total Rows: 50,000

Sample Profile Size: 10,000 rows

Row Count Limit: 30,000 rows

Row Constraint: Enabled

Sample Data Count: 100

Sample

Profiles 10,000 rows based on the configured Sample Profile Size. Calculated metrics include Min, Max, Distinct, and Null Counts. These results are based only on the sampled records.

Auto

Since the Table Row Count (50,000) exceeds the Row Count Limit (30,000), profiling runs on 10,000 rows as defined in the Sample Profile Size.

If the table had 20,000 rows, profiling would cover all rows, as it falls within the Row Count Limit.

Query

Profiling is skipped because the Table Row Count (50,000) exceeds the Row Count Limit (30,000).

If the row count were 25,000, profiling would be performed on the full table.

Disabled

Profiling is not executed, regardless of table size or other configuration settings.

Summary of the example:

Table Size: 50,000 rows

Row Count Limit: 30,000 rows

Sample Profile Size: 10,000 rows

Profile Type

Description

Sample

Profiles 10,000 rows based on Sample Profile Size

Auto

Profiles 10,000 rows (as table size exceeds the Row Count Limit).

Query

Skips profiling (as table size exceeds Row Count Limit).

Disabled

No profiling was performed.

Profile Type

Description

Sample

Profiles 10,000 rows based on Sample Profile Size

Auto

Profiles 10,000 rows (as table size exceeds the Row Count Limit).

Query

Skips profiling (as table size exceeds Row Count Limit).

Disabled

No profiling was performed.

Row Count Constraint

Applies only when the Profile Type is set to Auto.

If the Row Count Limit (100) is less than the Table Row Count (1000), profiling is based on the Sample Profile Size.
If the Row Count Limit (1000) is greater than the Table Row Count (100), profiling runs on the full table, ignoring the limit.

Row Count Limit

Enter the maximum number of rows to be considered for profiling.

Sample Data Count

Enter the number of rows displayed on the table data page in the Catalog.

Sample Profile Size

Select this option to consider the total number of rows in profiling.

Query Timeout

Enter the maximum duration in seconds for the query to run on the remote database before timeout.

During profiling, if System Settings > Others > enable.duckdb is set to true, profiling will be performed using DuckDB. In this case, the Data Type Size column in the profiling results will appear blank because DuckDB does not support Data Type Size.

Query Policies

Query Policies are integrated into the connector settings for connectors that support Query Sheet functionality.

Data Access Authorization

Data Access Authorization is configured within the connector settings specific to the crawler. It ensures that only authorized users can access the Query Sheet and related queries in the Data Catalog. User credentials are validated before access is granted.

Two permission models are supported:

OvalEdge follows to OvalEdge data permissions

When selected, a service account is used to establish the connection for the Query Sheet page, Data Catalog > Table > Data page, and Data Catalog > Query page to fetch the required data.

OvalEdge follows Remote system permissions

When selected, users are prompted to enter credentials for the remote data source in the Query Sheet page, Data Catalog > Table > Data page, and Data Catalog > Query page. These credentials are cached and automatically deleted after 7 days of inactivity.

Query Policies manage the use of specific query types such as Join, Union, Group, Sum, and Avg in the Query Sheet. For example, SYSADMIN roles can be restricted from executing Join operations.

To restrict users from using specific Query Types, the following steps need to be followed:

Navigate to the Settings and open the Connectors Setting page.
Click on Query Policies to view existing policy details in a data grid with Role, Query Type, Access Type, and Delete columns.
Click the +Add Query Policy button to add a new policy. Inline options for Role, Query Type, Access Type, and Cancel will be displayed.
Select the desired Role, Query Type, and Access Type, then click Save Changes.
The new policy will appear in the data grid.

Example: If the Role “OE_ADMIN”, Query Type “JOIN”, and Access Type “DENY” are selected, users with OE_ADMIN privileges will be restricted from using the JOIN function in the Query Sheet. If attempted, the message “User does not have permission on this <TableName> to perform this Function: Join” will be displayed.

Access Instruction

Access Instructions provide relevant details about a data source, which can be shared in formats such as links, images, or videos. Once added in the Access Instruction settings by the Integration Admin for a connection, the instruction appears on the Data Catalog > Data Objects summary page after crawling is completed.

Business Glossary Settings

Business Glossary Settings enable data source admins/owners to define how business terms are linked with data objects in their database. These settings precede the data association preferences configured on the business term itself.

The title and business description of the term can be copied to the data object if selected on the term.
The Steward, Custodian, Owner, and Governance Roles 4-6 of the business term cannot be assigned to the data object, even if selected on the term.
Data can be restricted or masked if selected on the term.
The classification defined on the term can be shown on the data object in the catalog if selected on the term.

Application-level settings always override Business Glossary Settings.

Anomaly Detection Settings

During profiling, the system compares newly profiled data with previously profiled data and identifies anomalies. It flags any inconsistency as an anomaly.

Two anomaly detection algorithms are available and configurable through system settings.

Interquartile Range (IQR)

The Interquartile Range (IQR) algorithm applies a statistical method to measure the spread of a dataset. It focuses on the middle 50% of the data, specifically the range between the first quartile (Q1) and the third quartile (Q3).

The steps below are involved in the calculation:

Sort the Data:
- Arrange the dataset in ascending order.
Calculate Quartiles:
- Q1 (First Quartile): This is the median of the lower half of the dataset.
- Q3 (Third Quartile): This is the median of the upper half of the dataset.
Calculate IQR:
- IQR is determined by subtracting Q1 from Q3: IQR = Q3 - Q1.
Identify Outliers:
- Any data points below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR are considered potential outliers.

Example: Profile executions with the following row counts: Profile 1 Row Count - 40 Profile 2 Row Count - 67 Profile 3 Row Count - 87 Profile 4 Row Count - 76 Profile 5 Row Count - 85 Profile 6 Row Count - 70 Profile 7 Row Count - 80

The steps below are involved in the calculation:

Sort the Data:
- Arrange the dataset in ascending order.
  - 40, 67, 70, 76, 80, 85, 87
Calculate Quartiles:
- Q2 (Second Quartile): 76 (Overall Median)
- Q1 (First Quartile): Median of the First Half (40, 67, 70) = 67
- Q3 (Third Quartile): Median of the Second Half (80, 85, 87) = 85
Calculate IQR:
- IQR is determined by subtracting Q1 from Q3:
- IQR = Q3 - Q1
- IQR = 85 - 67 = 18
Identify Outliers:
- Any data points below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR are considered potential outliers.
  - Q1 - 1.5 * IQR = 67 - 1.5 * 18 = 40
  - Q3 + 1.5 * IQR = 85 + 1.5 * 18 = 112
Assuming the 8th profile has been completed:
- Case 1: The row count is 150
- Case 2: The row count is 90
  - In Case 1, the system reports an anomaly because 150 falls outside the range of 40 to 112.
  - In Case 2, the system does not report an anomaly because 90 falls within the range.

Deviation

The Deviation algorithm evaluates the percentage change in profiling metrics across supported data objects. It compares the statistics of the last profiled data with the recently profiled data to detect anomalies. The algorithm calculates the percentage increase or decrease between the two profiles.

Example: Consider a table with the following row counts:

Row Count in First Profiling: 50
Row Count in Second Profiling: 85

Now, let's compute the deviation percentage:

Deviation Percentage = ((Recent Profiled - Last Profiled) / Last Profiled) * 100

Deviation Percentage = ((85 - 50) / 50) * 100= 70%

The deviation percentage between the first and second profiling is 70%.
- Case 1: If the threshold is set to 50%, the system reports this as an anomaly due to a positive deviation.
- Case 2: If the threshold is set to 75%, the system does not report an anomaly, as the deviation remains within the defined limit.

Global Settings:

By default, the system applies anomaly settings at the global level. Configuration occurs at the application level under Administration > System Settings.

The following system settings are available:

anomaly.detection.analysis.enabled

TRUE

To activate or deactivate the anomaly feature within the application.

Parameters:

If set to "true," the anomaly feature gets activated, allowing it to function as intended.

If set to "false," the anomaly feature gets deactivated, and the connector and data object level anomaly-related settings will appear grayed out.

anomaly.detection.analysis.algorithm.list

Deviation, IQR

To configure anomaly detection analysis algorithms.

Parameters:

Enter the names of the algorithms separated by commas. Supported algorithms include 'iqr' and 'deviation'.

anomaly.detection.analysis.algorithm.selection

IQR

These values represent the algorithm that will be used for anomaly detection on different objects.

Parameters:

Enter the algorithm that should be considered to identify anomalies.

anomaly.detection.analysis.iqr.range

1-50

To set the threshold percentage of data change (positive or negative), an anomaly will be triggered if exceeded.

Parameters:

Enter the range in the field provided.

anomaly.detection.analysis.deviation.threshold

Defines the deviation percentage threshold for triggering anomalies.

Parameters:

Enter a numerical value. Anomalies trigger if the deviation exceeds this threshold.

Lineage (Applicable for Reports & ETL’s) ETL Connectors/Reports

This setting allows configuration of multiple SQL dialects and assignment of connector priorities to support lineage generation from ETL tools and reports.

Select server dialects to use for parsing source codes: Select one or more SQL dialects to parse source code from ETL connectors. The system will process the code using the dialects in the specified order. If the first dialect fails to parse the code, the next one will be used, and so on. If none of the dialects can parse the code, the system will display a “parse failed” message.

At least one server dialect must be selected to generate code lineage.

Connector Priority to Specify Connection for Table Lineage: When the same table name exists in more than one cataloged connector, connector priority decides which connector to use for table lineage. The system resolves the object based on the order in which connectors are listed.

For example, if there are connectors MySQL X, Oracle Y, and DB2Z, and the customer table exists in both Oracle Y and DB2Z, the lineage will be taken from Oracle Y if it is listed before DB2Z.

Connection Pooling

The Connectors module provides configurable connection pooling settings within the application. This feature enables precise management of database connection parameters for improved flexibility, performance, and consistency across all RDBMS connectors.

A Connection Pooling tab is available under Administrator > Connectors > Settings for configuring database connection parameters.

Configurable Parameters

Maximum Pool Size: The Maximum number of active connections allowed in the pool.
Minimum Idle Connections: Minimum number of idle connections kept ready for use.
Connection Timeout: Maximum time allowed to establish a new connection.
Idle Timeout: Duration an idle connection remains before being closed.
Query Timeout: Maximum execution time permitted for a query.
Validation Timeout: Time allocated to validate a connection before use.
Leak Detection Threshold: Duration used to detect potential connection leaks.

Configuration Behavior

Default values are pre-applied and can be modified as required. Validation checks ensure input accuracy. Updated values are applied during the next job execution without affecting active sessions.

Connection pooling is standardized across all RDBMS connectors, with logs and metrics available for monitoring.

Steps to Access and Update

Go to the Connectors screen.
Select the desired RDBMS connector.
Open the nine-dot menu and select Settings.
Open the Connection Pooling tab.
Configure the required parameters.
Click Save Changes to apply.

Others (Applicable for Reports & RDBMS)

Enable/Disable Metadata Change Notifications

To enable or disable metadata change notifications, configure the Send Metadata Changes Notifications setting. Based on the configuration, the system notifies Owners, Stewards, Custodians, and other Governance Roles.

The System Alert category initially includes only the crawl summary. The system categorizes other metadata changes, including notification preferences, under Watchlist Alerts.

Context URL (Applicable only for Reports)

The browser extension uses the Context URL feature in Connector Settings to associate metadata with specific URLs. This allows the extension to retrieve metadata from the catalog even when used outside the application.

The Context URL feature applies only to the Report connection type.

To activate the Context URL feature for a particular data source, follow these steps:

Select the Settings option and navigate to the Connectors Settings page.
Select the Context URL option.
Enter the browser URL of the data source (e.g., https://tableauad.xxxx.xxx).
Click + Add New URL to add more entries.

Once configured, the Browser Extension connects the application with the source report database. This enables users to view the Certification Status or Endorsement Rating of a report directly from the application.

PreviousSet Up a Connection NextCustomization via System Settings

Last updated 1 month ago

Was this helpful?