AI Models

AI Models operate on terms that help define similar data objects. These terms facilitate easy control over data objects, allowing for the application of policies and standardization of metadata. Terms are analyzed for their applicability to various data objects. The AI models examine the metadata, data, and patterns of the target data objects to recommend and apply relevant terms and classifications.

Use Case

Security: In a healthcare organization, the Data Classification Recommendation module can help to identify and secure patient records to comply with HIPAA regulations. It classifies records containing sensitive data, such as medical history, diagnoses, and treatments, enabling enhanced security measures. Users can utilize AI models to get recommendations tailored to financial data types, such as credit card numbers, bank account details, and transaction amounts. The bulk operations feature enables users to review and accept or reject recommendations efficiently.
Literacy: In multinational corporations, the Data Classification Recommendation module enhances employee data literacy across departments. For instance, marketing teams classify customer demographics and preferences, while legal teams classify contracts and agreements.

Execution Methods

Users can execute the AI Model for a whole domain, like Privacy, and receive recommendations for all PII terms.

Manual Execution or Scheduled Runs: Users can manually run AI Models to receive recommendations or schedule them to execute within specific time frames.
Model Modification: Users can customize AI Models to meet their specific requirements.
Viewing Recommendations: Within the AI Models, users can view recommendations on data objects by simply selecting Yes/No.

AI Model Logic

The AI Models work in the backend based on Smart Score, which analyzes an object's Name, Data, and Pattern to determine the relevance of an object-term relation. The model starts by analyzing the characteristics of the data objects, including their name, metadata, and content. It then examines the patterns in the data, including the frequency and co-occurrence of different terms within the data objects.

Once the Smart Score is calculated, the AI model generates a list of the most relevant data objects for the given term.

Smart Score Parameters

The Smart Score is the core metric used by the Data Classification Recommendation (DCR) model to determine the relevance between a data object and a business glossary term.

It evaluates similarity across four primary parameters — Name, Data, Patterns, and Data Types — and applies configurable weightages to calculate the final score.

Name Score

Purpose: Evaluate how closely a data object’s name matches the glossary term name or its synonyms.

Calculation Logic:

The model compares each object name against aggregated names in the AI Model (built from associated objects).
Similarity is computed using the selected name matching algorithm (Fuzzy Logic, Cosine Similarity, or LLM).
If the same name appears multiple times in associated objects (e.g., Email:2), its frequency boosts the score.
The maximum match value across all comparisons is used as the final Name Score.

Example: Term: Email Associated Objects: Email, EmailID (Email repeated twice) Object under recommendation: emailaddress

Comparison

Raw Score

Repetition Boost

Final Name Score

Email → emailaddress

+2 (0.1 boost × repetition count)

EmailID → emailaddress

→ The maximum (30) is taken as the final Name Score.

Additional Boosts:

Exact Name Match: +10 Smart Score boost
Synonym Match: +10 boost if the term’s AI keyword (e.g., “E-message”) is found in the object name
Skip Object: If “Skip if Object Name ≠ Term Name/Synonym” is enabled, objects failing name/synonym criteria are ignored.

Data Score

Purpose: Measure similarity between the data values of the object and those of already associated term objects.

Calculation Logic:

Compares top profiled values from the object with the aggregated top values in the AI Model.
The match is calculated based on the intersection of values and total data volume.
Data type consistency can provide an additional score boost.

Example: Term: Email Top Values in Model: {[email protected], [email protected]} Top Values in Object: {[email protected], [email protected], [email protected]}

→ MatchCount = 2 → Data Score ≈ (Through Formula) 1.5

Data Type Match - Boost Smart Score

Purpose: Reward semantic consistency in data type usage across term-associated objects.

Calculation Logic:

If a column’s data type matches the data type of the term’s associated objects (e.g., both are VARCHAR or DATE), a fixed boost is applied.

Example: Term Transaction Date → DATE type Object column Order Date → DATE type

→ +10 added to Smart Score.

→ If Data Matching ON and Data Type Match Boost = 10, final score = 1.5 + 10 = 11.5

Pattern Score:

Purpose: Identify structural pattern similarities (e.g., formats or regular expressions) between the term’s data and the object’s data.

Calculation Logic:

Extracts profiling patterns (e.g., DDD-DD-DDDD, [email protected]) from associated data.
Compare these against object data patterns.
Matches are scored and weighted based on pattern frequency.

Example: Term: SSN Associated Object Pattern: DDD-DD-DDDD Object Column: emp_ssn → Matches the same pattern twice. Score: (Through Internal formula) 55

If Pattern Matching = ON, Pattern Score = 55; else ignored.

Rejected Score (Penalty)

Purpose: Incorporate learning from user rejections to reduce false positives.

Calculation Logic:

Each rejected object contributes to a “negative model.”
During the next run, if a new object resembles a previously rejected one, its name score is reduced by the Rejected Score Weightage factor.

Smart Score = [Name + Data + Pattern – (Rejected_Fuzzy × Rejected_Weightage)] / 100

Example: Rejected fuzzy = 20, Weightage = 0.2 → Deduction = 4

Access Control

Access to the Data Classification Recommendation is handled through Application Security. Users with an Author license in the Authorized Roles section can access the Data Classification Recommendation Modules and create Models. Stewards of the term and data objects with access to the Data Classification Recommendation page can also visit the page and accept or reject the recommendations.

PreviousData Classification Recommendations NextAI Model Set Up

Last updated 2 months ago

Was this helpful?