AI Model Set Up

An AI recommendation model is set up in three steps:

Step 1: Model Configuration

Defines the basic identity and purpose of the AI Model.

Fields to Configure
- Model Name & Description – Provide a clear identifier and purpose for the model.
- Domain/Category/Sub-category/Terms – Select the business glossary scope for which recommendations will be generated.
Proceed → Step 2: Object & Source Selection

Step 2: Object & Source Selection

Defines where the model runs and which data objects are included.

Object Type – Choose the object level (Schema, Table, Table Column, File Column, Report Column, API Attribute, etc.).
Source Type – Select whether the model runs on a Connector or specific Schemas.
Connector: Includes all schemas under the chosen connector.
Schema: Limits the run to selected schemas only.
Run On – Choose whether to run on
- Delta Data – newly crawled or updated objects since the last run, or
- All Objects – complete re-evaluation of all eligible objects.
Include Objects Option – Select between
- Unassociated Objects (only those without terms), or
- All Objects (including those already associated for re-evaluation).
Tag Selection Box – Filter objects by selected business tags.
- Only objects containing these tags will be considered for recommendations.
Do not recommend terms for objects with names matching the regex below – Enter one or more regular expressions to ignore objects matching those patterns.
Except for objects whose names match the regular expression below – List object names or patterns that must still be included even if they match the exclusion regex.
Notification Preference – Choose who receives alerts when recommendations are generated:
- Steward of Term
- Steward of Data Object
- Both / None

Step 3: AI Configuration

Configures how the AI Model evaluates, scores, and recommends terms.

Smart Score Recommendations: This section defines the overall scoring thresholds used during model execution.
- Minimum Smart Score for Recommendation – The minimum score required for a term–object pair to appear in the recommendation list. Recommendations below this score will be ignored.
- Minimum Score for Auto-Acceptance of Recommendations – Score above which recommendations are automatically accepted.
Configure Smart Score Calculation
- Smart Score Boost for Name Matches - Adds a fixed score boost when the object name exactly matches the term name. (Default: 20)
- Smart Score Boost on Column Repetition - Adds a boost when the same column name appears multiple times in the term’s associated objects. (Default: 20)
- Smart Score Boost when Synonym Matches - Adds a boost if the object name matches one of the synonyms configured for the term. (Default: 10)
- Rejection Weightage (%) - Percentage deduction applied to similar recommendations that were previously rejected. Helps reduce false positives. (Default: 20%)
- Name, Data, and Pattern Weightage - Defines how much each factor contributes to the Smart Score. The total should be close to 100. Example: 50:25:25 → 50% Name, 25% Data, 25% Pattern.
Name Similarity Algorithm: This section controls how the AI compares term names and object names.
- Skip objects if Object Name is not equal to or part of Term Name or Term Synonyms: Ensures only objects with partial or exact name alignment are evaluated.
- Heuristic: Consider Object Name Pattern Regex configured at the Term: Enables regex-based name pattern matching at the term level.
- Algorithm Selection: (Select one)
  - Cosine Similarity – Vectorized text comparison using embeddings.
  - LLM (Semantic) – Uses a Large Language Model for context-aware matching.
  - Levenshtein Similarity – Edit-distance algorithm that measures character-level differences.
  - Fuzzy Logic – Basic approximate string matching.
Configure Data Algorithm: This section controls how the AI model evaluates data values between term-associated objects and new objects.
- Match Data from Associated Objects of the Term: Compares top profiled values between the term’s existing associated objects and the candidate object.
- Heuristic: Consider Object Data Pattern Regex Configured at the Term: Applies regex defined at term level for pattern-based value matching.
- Boost Smart Score when recommended object data type matches term-associated object data types: Adds a boost when the data types align (e.g., both are VARCHAR or INTEGER).
Pattern Matching: Controls whether pattern-based similarity (e.g., character or format pattern) is used in scoring.
- Toggle:
  - ON – AI analyzes data patterns such as digit counts or text formats (e.g., DDD-DD-DDDD for SSN).
  - OFF – Pattern-based scoring is disabled.

To create an AI Model in the OvalEdge application, follow these steps:

Navigate to Governance Catalog > Data Classifications Recommendations.
Click on Create Model. A pop-up with a three-step process will appear: Model Configuration, Object & Source Selection, and AI Configuration.

Model Configuration

Enter the following details and proceed to Object and Source Selection.

Field Name

Description

AI Model Name

Enter a clear and descriptive name for the AI Model to help users easily identify it in the list view.

Example: Finance_Domain_DCR_Model

AI Model Description

Provide a brief explanation of the model’s purpose or focus area.

Example: Recommends financial glossary terms to transaction-related tables.

Domain

Select the domain under which the AI Model should operate. Terms belonging to this domain will be considered for generating recommendations.

Object & Source Selection

After configuring the model details, click Continue to define the objects and source for recommendations.

Enter the following details and proceed to AI Configurations.

Field Name

Description

Object Type

Select the object level for which recommendations should be generated.

Example: Schema, Table, Table Column, File Column, Report Column, or API Attribute.

Source Type

Choose whether to run the model at a Connector level or a specific Schema level.

- Connector: Includes all schemas under the connector.

- Schema: Restricts execution to selected schemas only.

Refine Source Selection

Based on the "Connector" or "Schema" selection for the Source Type, specify the connectors or schemas to include.

Select Objects

Users can choose the Source Type for which they require recommendations.

If the Object Type is selected as “Table” and the Source Type is “Connector,” the Refine Source Selection displays all established connectors in the system.
If the Object Type is selected as “Table” and the Source Type is “Schema,” the Refine Source Selection displays all Schemas related to different Connectors established in the system.

The system supports multiple source types for the model, with a maximum limit of 20 at any given instance. Search filters such as Connector Name, Connector ID, and Connector Type for Connector Source, and Connector Name and Schema Name for Schema, are enabled to simplify the search and locate relevant Source Types. Users can select Sources by clicking on the corresponding rows.

Run On

This section provides two options to define the data processing range used by the AI Model for generating recommendations:

All Objects: Processes all available data for the chosen object (from the connector or schema) to generate recommendations.
Delta Data: Processes recently crawled data and data not previously analyzed by the AI model since the last run to generate recommendations.

Include Objects

This section presents two options for determining how the AI Model processes data to provide recommendations for the selected object.

All Objects: Processes all data, regardless of term associations, within the chosen object from the connector or schema.
Unassociated Objects: Processes only unassociated data objects without any term associations within the chosen object from the connector or schema.

Consider objects with following tags

Filter the scope by selecting one or more tags. Only objects carrying the selected tags will be included in the recommendation run.

Example: Selecting tags Finance and CustomerData ensures only those tagged datasets are evaluated.

Do not recommend terms for objects with names matching the regex below

Exclude objects whose names match the given regular expression patterns. Helpful in ignoring temporary or system-generated objects.

Example: ^temp_, _backup$ excludes objects like temp_sales, orders_backup.

Except for objects whose names match the regular expression below

(Override the previous)

Include specific objects even if they match the exclusion regex patterns.

Example: orders_backup_2025 will still be considered even though it matches _backup$.

Maximum Columns for AI Recommendation

Define the maximum number of columns to be processed in a single run to optimize performance and runtime.

Example: 10000 columns.

Notification Preference

Specify the notification preference when the recommendations are generated, and the available options are

Notify the Steward of the Term
Notify the Steward of the Data Object
Notify both the Steward of the Term and the Steward of the Data Object
Notify None

AI Configuration

After selecting the Object and Source, click Continue to configure AI parameters for recommendations.

Define AI Score and other configurations for object recommendations, and click Save to create the AI Model.

Configuration Name

Description

Minimum Smart Score for Recommendation

Enter the minimum score required for a recommendation to appear in results. Any object–term pair scoring below this value will be excluded.

Example: Set 10 to display only high-confidence results.

Minimum Score for Auto-Acceptance of Recommendations

Specify the score above which recommendations are automatically accepted and linked to terms without manual review.

Example: If 60 is set, all recommendations with ≥ 60 Smart Score are auto-accepted.

Default value:0

Choose a value higher than your typical smart score to capture relevant recommendations. This value should be a positive whole number.

Smart Score Boost for Name Matches

Adds a fixed boost when the object name exactly matches the term name. Helps prioritize perfect matches.

Default: 20

Smart Score Boost for Column Repetition

Adds a score boost when a column name appears multiple times in the term’s associated objects. Reflects a stronger correlation.

Default value: 0.1

Smart Score Boost when Synonym Matches

Adds a fixed boost when an object name matches a configured synonym of the term.

Default: 10

Rejection Weightage (%)

Defines the penalty applied to recommendations similar to previously rejected ones, reducing their overall Smart Score.

Default: 20 %

Name, Data, Pattern Weightage

Assign weight ratios to factors (Name, Data, and Pattern) when calculating the 'Smart Score.'

Enter weights: Specify the percentage for Name, Data, and Pattern when calculating the Smart Score. The sum of these values should be around 100.

Example: Name (30): Data (30): Pattern (40) - This gives 30% weight to Name, 30% to Data, and 40% to Pattern.

These weights are not set by default.

Skip objects if Object Name is not equal to or part of Term Name or Term synonyms

When enabled, it excludes objects whose names are not equal to or part of the term name or its synonyms, improving precision.

Heuristic: Consider Object Name Pattern Regex configured at the Term

Applies name-pattern regex configured at the term level to enhance name matching.

Algorithm Selection

(Select one)

Choose the algorithm used for name similarity comparison:

Cosine Similarity – Vectorized text comparison using embeddings.
Pros: Captures context; faster than LLM.
LLM (Semantic) – Uses a Large Language Model for context-aware matching.
Levenshtein Similarity – Edit-distance algorithm that measures character-level differences.
Fuzzy Logic – Basic approximate string matching.

Configure Data Algorithm

Enable or disable data-value comparison during scoring. When ON, top profiled values are matched with term-associated objects.

Match Data from Associated Objects of the Term

Compares top profiled values between the term’s existing associated objects and the candidate object.

Example: If the term “Email” has data like [email protected], any column with similar patterns receives a boost.

Heuristic: Consider Object Data Pattern Regex Configured at the Term

Applies regex defined at the term level to detect pattern-based data matches (e.g., email or ID formats).

Example: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,} for email detection.

Boost Smart Score when recommended object data type matches term-associated object data types

Adds a boost when the data type of the object matches the term’s associated data type (e.g., VARCHAR ↔ VARCHAR).

Pattern Matching Toggle

Controls whether pattern-based similarity (e.g., character or format pattern) is used in scoring.

Toggle:

ON – AI analyzes data patterns such as digit counts or text formats (e.g., DDD-DD-DDDD for SSN).

OFF – Pattern-based scoring is disabled.

Reset Preferences to Default

Click on this button to revert all customized settings to the system’s default configuration.

Smart Score Calculation

The Smart Score, computed by the DCR engine, expresses how strongly a data object (column, file column, report column, etc.) matches a glossary term. It is a composite score that combines multiple signals — Name, Data, Pattern, and Data Type — then applies configurable boosts and penalties (rejections) before producing the final ranking used for display and automation.

Parameters

Name Score
Data Score
Pattern Score
Rejected Score

Combining signals

The engine calculates each component score (Name, Data, Pattern, Data Type) according to chosen algorithms and profiling data.
Configured weightages determine the relative importance of Name vs Data vs Pattern in the combined signal (set in AI Configurations).
Boosts (exact name, synonym, repetition, regex, data type) are applied to increase confidence for strong matches.
Rejection penalties are applied to reduce scores for patterns similar to previous user rejections.

Algorithm selection & fallbacks

Name-matching algorithms are selectable (LLM, Cosine, Levenshtein, Fuzzy). Choose LLM for semantic accuracy, Cosine for embedding-based similarity, Levenshtein/Fuzzy for fast character-based matching.
Fallback: If an expensive method (e.g., LLM) fails (rate-limit, outage), the system falls back to a configured lower-cost method (e.g., Cosine or Fuzzy), ensuring the job completes.

Operational guidance (best practices)

Start conservative: For initial tests set weightages to favor Name (switch boosts & data algorithm & patterns off). Run per-algorithm experiments (LLM, Cosine, Levenshtein, Fuzzy) to identify which works best for your dataset.
Enable boosts iteratively: Turn on exact name/synonym boosts only after validating baseline results.
Use tag- and regex-based scoping (Object & Source Selection) to reduce noise and focus model training on relevant objects.
Monitor metrics: Track acceptance/rejection ratios and auto-accept counts to tune Minimum and Threshold scores, rejection weightage, and weightages for name/data/pattern.

Data Classification Recommendation - Strategy

To know more about the DCR Strategy, refer to the Data Classification Recommendation (DCR) – Model Strategy Framework.

PreviousAI Models NextAI Model Management

Last updated 2 months ago

Was this helpful?