AI Enrichments

Why AI Enrichment Matters

Exploratory analysis provides answers, but deeper insight often requires creating new columns, classifying values, or enriching existing data with advanced intelligence. Traditionally, these tasks involve writing formulas, scripts, or running models in external tools.

askEdgi integrates these capabilities directly into the Workspace through AI Functions. Using natural language prompts, new calculated columns can be created, records classified, or text data analyzed without the need for external scripting or additional tools.

What Can Be Done

AI Functions provide the ability to:

  • Create calculated columns (e.g., profit margin percentage)

  • Perform classifications (e.g., High/Medium/Low risk)

  • Apply text-based analysis techniques, including sentiment analysis, intent detection, emotion recognition, and classification, to analyze text data.

Use Case & Real-Life Scenario

Continuing the return analysis, a product manager explores severity levels using AI enrichment:

“Create a new column for return_rate = (returns/orders) * 100.

askEdgi generates a calculated column return_rate for each product.

“Classify products into High, Medium, or Low return categories based on return_rate (High: >30%, Medium: 10–30%, Low: <10%).”

askEdgi creates a new column return_category, to highlight product risk levels.

“Perform sentiment analysis on customer_reviews.”

askEdgi enriches the dataset with a sentiment_label column (Positive, Neutral, Negative).

By combining return patterns with sentiment insights, the analysis reveals that products with negative sentiment also report the highest return rates, signaling potential quality issues.

Different AI enrichments

Prompt Analysis Evaluates the clarity and effectiveness of user-generated prompts to ensure accurate results.

Example: Generate a new column comparing income in the dataset against the average income to provide deeper insight into earning levels.

Sentiment Analysis Classifies text data into Positive, Neutral, or Negative categories.

Example: Analyze customer reviews, browsing history, and purchase records to gain insights into customer behavior.

Intent Analysis Identifies underlying intent in textual data, classifying it into predefined categories.

Example: Detect intent in customer support or compliance interactions.

Emotion Analysis Detects emotional tones in text for better understanding of customer experiences.

Example: Assess emotions in product reviews or support conversations.

Text Classification Categorizes text into domain-specific classes such as fraud detection or spam filtering.

Example: Apply classification models to incoming emails or financial records to automate data analysis and processing.

Proofreading Identifies grammatical, clarity, and structural issues to ensure professional communication.

Example: Refine business documents or product descriptions for improved readability.

Availability

  • Public – Available (AI column creation and enrichment)

  • SaaS – Available (full AI functions)

  • On-Prem – Limited (Metadata analytics only)

AI Prompt and Context Visibility

askEdgi generates columns, insights, and results using AI based on user inputs and available data context. To improve transparency and trust, the system now allows users to view the prompt and contextual information used during AI processing.

This capability helps users understand how a result was generated, making it easier to explain outputs, validate logic, and troubleshoot unexpected results.

The feature is available only for AI-generated artifacts and is accessible directly from the analysis output.

Functional Behavior

For any column or result generated using AI:

  • Users can view the prompt sent to the AI

  • Users can view the context used during execution, such as:

    • Tables involved

    • Columns referenced

    • Relevant metadata or glossary terms

This information is presented in a read-only format and is intended for understanding and validation purposes only.

Accessing AI Prompt Details

Users can access prompt and context details using the following entry points:

  1. AI Icon on Output

    1. AI-generated columns or results are marked with an AI indicator

    2. Clicking or interacting with this icon reveals prompt details

  2. View AI Prompt Option

    1. A dedicated action may be available to open the prompt view

Both options provide the same information and are designed to be easily accessible during analysis.

Prompt and Context Details

When accessed, the system displays:

  • The exact prompt sent to the AI

  • The context used for generating the result

  • Clear distinction between:

    • User-provided input

    • System-generated prompt components

This ensures that users can fully understand how the AI arrived at a specific output.

Hover-Based Quick View

For quick inspection, users can hover over the AI-generated function or column.

The hover view provides:

  • Function name

  • Input columns used

  • Output column generated

  • AI prompt used

  • Confidence or accuracy (if available)

  • Timestamp of creation

This allows users to quickly inspect AI behavior without opening a detailed view.

Retrieval Augmented Generation (RAG) in askEdgi

Retrieval Augmented Generation (RAG) in askEdgi refers to answering questions by first retrieving trusted enterprise context and then generating responses based on that context.

askEdgi is not a generic AI chatbot.

It does not rely on assumptions or general knowledge.

Instead, askEdgi:

  • Understands business terms defined by the organization

  • Knows what data exists and how it is governed

  • Considers structural relationships between datasets

  • Uses metadata, lineage, and contextual statistics

  • Suggests relevant and trusted assets before execution

  • Explains answers using enterprise context

This approach ensures that responses are grounded in how the organization understands and manages its data.

Business Value of RAG

Business users frequently need answers to questions such as:

  • Which dataset should be used for a metric?

  • What does a business term mean?

  • Where does a number originate?

  • Why do different reports show different values?

  • Which datasets should be combined?

These questions require understanding business meaning, structural relationships, and governance alignment.

Retrieval Augmented Generation ensures:

  • Accurate answers

  • Consistent interpretation

  • Connected datasets

  • Trustworthy analysis

Enterprise Context Used by RAG

The enterprise context retrieved by askEdgi includes the following elements.

Context Type
Description

Business glossary

Approved business definitions and terminology

Curated datasets

Trusted and governed data assets

Governance information

Ownership, classification, and access controls

Metadata and documentation

Business and technical descriptions

Dataset relationships

Structural connections between assets

Lineage information

Data movement and dependency paths

Contextual statistics

Sample characteristics and value distributions

Top values

Frequently occurring values used for interpretation

circle-info

Sample statistics and top value summaries improve interpretation but remain secondary to governed metadata and definitions.

askEdgi Modes and the Role of RAG

askEdgi operates in two clearly defined modes to support different user needs. Each mode has a clear purpose and boundary that ensures trust and predictability.

Analysis Mode

Analysis Mode is the default and primary mode. It supports the complete journey from understanding a question to generating insights.

Purpose of Analysis Mode

Analysis Mode supports the following activities:

  • Understanding a question

  • Identifying the correct data

  • Validating how datasets relate

  • Performing analysis

  • Receiving business-aligned explanations

Analysis Mode supports the complete journey from discovery to insight without switching between guidance and execution modes.

Understand RAG Usage in Analysis Mode

In Analysis Mode, Retrieval Augmented Generation performs the following functions:

  • Interpret the business meaning behind a question

  • Retrieve relevant glossary definitions

  • Surface business and technical descriptions of assets

  • Evaluate asset metadata and documentation

  • Analyze relationships between datasets

  • Eliminate unrelated or disconnected tables

  • Confirm that selected datasets combine correctly

Retrieval Augmented Generation ensures reasoning in a business context before execution begins.

Understand Relationship Aware Intelligence

When a question spans multiple datasets, askEdgi performs structural validation.

askEdgi performs the following actions:

  • Confirm that selected assets are structurally connected

  • Avoid a combination of unrelated datasets

  • Suggest additional related assets only when required

  • Limit context expansion to what is necessary

This prevents:

  • Incorrect joins

  • Over-selection of irrelevant tables

  • Misleading analysis

  • Loss of trust

Datasets are validated as part of a connected data ecosystem rather than isolated objects.

Understand Workspace First Execution

Analysis Mode respects the Workspace as the execution boundary.

Execution rules are as follows:

  • If tables remain pinned, analysis is restricted to pinned tables

  • If tables are not pinned, eligible workspace tables are considered

  • Additional catalog assets are surfaced only when necessary

  • Data outside the intended scope is not analyzed

This ensures controlled execution.

Discovery Mode

Discovery Mode supports structured exploration of the Data Catalog. This mode does not use RAG and does not perform analysis execution.

Purpose of Discovery Mode

Discovery Mode supports:

  • Asset browsing

  • Data availability validation

  • Metadata understanding

  • Documentation review

Understand Discovery Mode Behavior

In Discovery Mode, askEdgi performs the following actions:

  • Retrieve assets from the catalog

  • Surface business descriptions and technical documentation

  • Apply governance-aware filters

  • Return metadata and definitions

RAG-based reasoning does not occur in this mode. Discovery Mode provides clarity without execution.

circle-info

Discovery Mode does not use RAG. Only Analysis Mode uses RAG.

How askEdgi Finds the Right Context in Analysis Mode

The following sequence describes how askEdgi retrieves context and prepares for execution.

Step 1: Determine Workspace Dependency

askEdgi evaluates whether the request requires existing workspace data.

Workspace data is required when the request:

  • Reads existing tables

  • Computes metrics from data

  • References workspace objects

  • Validates schemas

Workspace data is not required when the request:

  • Requests an example

  • Requests a sample SQL or Python

  • Requires logical reasoning without data

  • Creates new structures without referencing existing data

This separation improves clarity and efficiency.

Step 2: Evaluate Existing Workspace Context

askEdgi checks whether sufficient context already exists within the Workspace.

If sufficient context exists, search expansion does not occur.

Step 3: Enrich Business Understanding

When additional clarity is required, askEdgi retrieves:

  • Glossary definitions

  • Asset descriptions

  • Metadata context

This ensures the correct interpretation of business intent.

Step 4: Suggest Relevant Assets

When necessary, askEdgi identifies additional datasets aligned with business intent.

Only governed and relevant assets are considered.

Step 5: Validate Dataset Compatibility

Before execution, askEdgi confirms:

  • Required attributes exist

  • Datasets are structurally connected

  • Necessary relationships are available

  • Required elements are complete

If validation fails, execution does not proceed.

circle-exclamation

Step 6: Execute Analysis

Execution occurs only after:

  • Context is sufficient

  • Relationships are confirmed

  • Required data elements exist

Execution is intentional and validated.

How askEdgi Handles Missing or Incomplete Information

askEdgi stops intentionally to ensure accurate and trustworthy results.

askEdgi stops when:

  • Required data is missing

  • Structural compatibility cannot be confirmed

  • Business context is unclear

  • Confidence is insufficient

askEdgi does not:

  • Guess

  • Partially execute

  • Assume schema

This results in predictable and trustworthy outcomes.

RAG Trust in Analysis Mode

The RAG framework in askEdgi relies on controlled enterprise grounding.

RAG uses the following information sources:

  • Curated business descriptions

  • Technical documentation

  • Structured metadata

  • Relationship context between assets

  • Lineage information

  • Contextual data statistics, such as sample data characteristics

  • Top 50 values

circle-info

Contextual data statistics and the top 50 values improve relevance and interpretation. Governed metadata and business definitions remain the primary reference.

askEdgi enforces the following controls:

  • Respect governance and access rules

  • Validate structural compatibility before dataset combination

  • Confirm required attributes before execution

  • Stop when information remains incomplete

  • Maintain clear separation between discovery and execution

This layered grounding ensures business-aligned, structurally valid, and explainable responses.

RAG Limitations

Certain behaviors remain intentionally restricted to maintain trust.

askEdgi avoids:

  • Guessing or fabricating answers

  • Ignoring governance controls

  • Combining unrelated datasets

  • Excessive expansion across the data ecosystem

  • Execution with incomplete schema validation

Restraint remains a core system principle.

Business Impact

Organizations gain the following benefits:

  • Faster and safer data discovery

  • Reduced dependency on technical teams

  • Fewer incorrect dataset combinations

  • Strong structural validation before analysis

  • Higher trust in analytics and reporting

  • Streamlined workflows without mode confusion

  • Better alignment between business and data teams

askEdgi serves as a reliable entry point to enterprise data knowledge and trusted analysis.

Summary

RAG forms the foundation that makes askEdgi:

  • Context-aware instead of generic

  • Relationship-aware instead of isolated

  • Schema-aware instead of assumptive

  • Dependency-aware instead of speculative

  • Trusted instead of uncertain

  • Business aligned instead of technically driven

Clear separation between Analysis Mode and Discovery Mode with structured validation before execution ensures intentional, explainable, and trustworthy interactions.

Metadata-Aware Retrieval and Ranking

This section describes the enhanced Retrieval-Augmented Generation (RAG) capabilities in askEdgi, which introduce metadata-aware retrieval, governance-driven ranking, and intelligent embedding strategies to improve the relevance, trustworthiness, and explainability of results.

Retrieval Architecture

The askEdgi retrieval process follows a structured, multi-stage pipeline that balances semantic relevance with governance trust signals.

Stage 1 – Relevance Retrieval

  • User query is transformed into a semantic embedding.

  • The system performs:

    • Vector similarity search to capture semantic intent

    • Keyword-based search to capture exact matches across metadata attributes

  • Results from both approaches are combined to generate a candidate set of data assets (typically 20–30 objects).

Stage 2 – Governance Re-Ranking

  • Candidate assets are re-evaluated using governance and trust signals.

  • Each asset is assigned a final ranking score based on relevance and governance strength.

  • The top-ranked assets (typically 5–10) are selected and passed to downstream processing for response generation.

Key Characteristics

  • Retrieval prioritizes both intent relevance and metadata quality.

  • Governance signals influence ranking, ensuring that curated and trusted assets are preferred.

  • Classification levels (e.g., Public, Internal, Restricted) are applied as access controls and do not influence ranking.

Embedding Construction

Each data asset is converted into a structured semantic representation prior to embedding generation.

Context Construction

Instead of treating metadata as independent attributes, the system constructs a business-context narrative that captures:

  • Object title and description

  • Business context and usage

  • Glossary associations and hierarchy

  • Tag hierarchy and classification

  • Custom fields

  • Governance indicators

  • Data quality context

This contextual representation is then converted into a vector embedding.

Key Characteristics

  • Improves semantic understanding of data assets

  • Enhances alignment between user intent and retrieved results

  • Enables more accurate and context-aware retrieval

Ranking and Scoring Model

The final ranking of data assets is determined using a combination of relevance and governance signals.

Final Ranking Formula

FinalScore = (0.75 × Relevance Score) + (0.25 × Governance Score)

Where:

SearchRelevanceScore = (VectorSimilarity + KeywordScore) / 2

Governance Score Calculation

GovernanceScore represents the trustworthiness and curation level of a data asset.

Contributing Signals

  • Critical Data Element (CDE) indicator

  • Data Quality score

  • Certification status

  • Authoritative dataset flag

Weight Distribution

  • CDE Indicator – 40%

  • Data Quality Score – 30%

  • Certification Status – 20%

  • Authoritative Flag – 10%

Key Characteristics

  • Prioritizes curated and enterprise-approved datasets

  • Ensures that highly governed assets rank higher than unmanaged data

  • Maintains balance between relevance and trust

Metadata Curation Score Gate (Embedding Eligibility)

A metadata curation score is used to determine whether a data asset is eligible for inclusion in the embedding and retrieval process.

Funtionality

  • Assets are evaluated based on a composite metadata curation score.

  • Only assets meeting the configured threshold are:

    • Embedded into the vector index

    • Considered during analysis and retrieval

  • Assets below the threshold:

    • Remain available in the catalog

    • Are excluded from analysis-driven retrieval

Configuration

  • Threshold is configurable (range: 0–100)

  • The default value allows the inclusion of all assets

  • Designed to support gradual adoption based on governance maturity

Re-Evaluation

  • Eligibility is periodically re-evaluated

  • Assets can automatically enter or exit the embedding pool based on updated metadata quality

Steward Feedback

  • When assets fall below the threshold, stewards are provided with:

    • Current score

    • Required threshold

    • Key metadata gaps

    • Recommended improvements

Custom Field Trust Signal Integration

Custom fields are interpreted as governance signals to enhance trust scoring without requiring explicit configuration.

Signal Identification

  • The system analyzes custom field names using predefined keyword patterns.

  • Matching fields are mapped to governance signal categories such as:

    • CDE indicators

    • Data quality rules

    • Authoritative source indicators

    • Policy and compliance references

Signal Processing

  • If a native governance signal is available, it is used directly.

  • If not, matched custom fields act as proxy signals.

Confidence Adjustment

  • Custom field–derived signals are applied with a confidence factor to ensure balanced scoring.

Validation

  • Signals are applied only when associated with valid object types

  • Prevents incorrect or irrelevant mappings

Key Characteristics

  • Enables the utilization of governance information stored in custom fields

  • Improves the accuracy of trust scoring across diverse implementations

  • Reduces dependency on the strict standardization of metadata models

Prompt Guidance for Next-Best Questions

askEdgi provides prompt guidance to assist users in continuing their analysis by suggesting relevant follow-up questions. This capability helps users refine or expand their queries without requiring manual prompt formulation.

After each user prompt is processed and a response is generated, the system evaluates:

  • The user’s original question

  • The intent identified by askEdgi

  • The results returned (tables, insights, summaries)

Based on this, the system generates a set of relevant follow-up prompts that naturally extend the current analysis.

How Prompt Guidance Works

  • Prompt suggestions are generated only after a response is produced

  • Suggestions are derived from:

    • Current query intent

    • Result patterns (e.g., trends, anomalies, groupings)

    • Available metadata and context

  • Each suggestion is:

    • Directly related to the current analysis

    • Focused on helping users go deeper or refine results

    • Written in simple, natural language

Example flow:

User asks: “Show delayed orders.”

askEdgi responds with results.

Prompt Guidance suggests:

  • “Which vendors have the highest delays?”

  • “Are delays increasing over time?”

  • “Which regions are most affected?”

Execution Boundaries

Prompt Guidance does not participate in execution or data processing. Its role is limited to suggestion generation.

  • It does not trigger:

    • Data retrieval

    • Query execution

    • Recipe execution

  • It does not:

    • Modify the current results

    • Re-run analysis

    • Interfere with the main response

Suggestions are only recommendations, and execution happens only when the user selects a suggestion or enters a new prompt.

User Interaction Model

  • Suggestions are displayed immediately after each response

  • Users can:

    • Click on a suggested prompt to continue analysis

    • Ignore suggestions and enter their own query

When a suggestion is selected:

  • It is treated as a new prompt

  • askEdgi processes it through the standard flow (intent → execution → response)

Code Explanation Panel in AskEdgi

The Code Explanation Panel is a new feature that provides natural language summaries of SQL and Python code used to generate AskEdgi results. This feature improves accessibility for non-technical users and increases transparency in result generation.

Accessing the Code Explanation Panel

  • Open the Code View for a query or analysis result.

  • Click on the Explanation tab, positioned next to the existing Code and Copy options.

  • AskEdgi generates a concise, human-readable description of the code logic automatically.

Functionality

  • SQL Example: “This query retrieves the total sales for each region in 2023 and ranks them by revenue.”

  • Python Example: “This script converts unstructured balance sheet text into a structured DataFrame for two fiscal years.”

  • Explanations are streamlined and accurate, reflecting the code logic.

  • Users can toggle Show More / Show Less to expand or collapse longer explanations.

  • The explanation auto-refreshes whenever the code changes or is re-run.

Performance & UX

  • A loading indicator is shown while the explanation is being generated.

  • Explanations are cached for the session to improve response time on repeated views.

Error Handling

  • If the explanation cannot be generated (e.g., API error, timeout, unsupported code format), the following message is displayed:

  • “Explanation could not be generated. Please try again or refresh.”

  • For large or multi-step Python scripts, explanations are summarized in chunks (e.g., function-level or step-wise).

  • Users can optionally view a Detailed Explanation for step-wise logic.

Question Wall – Reply from Notifications

askEdgi supports replying to Question Wall conversations directly from notifications. This allows users to respond to questions without navigating back to the application, improving response time and reducing context switching.

This capability applies to notifications triggered from Question Wall activities and ensures that responses submitted externally are correctly posted back to the corresponding conversation.

Supported Notification Events

Reply capability is available for the following Question Wall events:

  • Question Assigned

  • New Reply Received

These events continue to follow the existing notification triggers and delivery behavior.

Supported Channels

Reply to notifications is supported across external notification channels used by the platform.

Users can view the notification and respond directly from the same channel without opening askEdgi.

Reply Integration

  • Users can submit a reply directly from the notification

  • The reply is posted to the same Question Wall conversation in the application

  • The response is treated as a standard reply within the thread

  • No separate workflow or approval is required

This ensures continuity of discussion regardless of where the response is submitted.

Context in Notifications

Notification content remains aligned with the triggering event.

For New Reply Received notifications:

  • The notification includes the most recent 2–3 replies for context

  • Each reply includes:

    • Author

    • Timestamp

    • Plain-text content (truncated if required)

This provides sufficient context for users to understand the conversation before responding.

Scope of Impact

  • No change to existing notification triggers

  • No change to notification templates or event definitions

  • Replies submitted externally are fully integrated into the Question Wall thread

  • The feature does not modify:

    • Question creation flow

    • Assignment logic

    • Notification audience logic

Intelligent Query Source Detection in askEdgi

askEdgi supports Intelligent Query Source Detection, enabling automatic optimization of where a query is executed. Instead of requiring all data to be ingested into the workspace, the system can determine whether a query should run within the workspace engine or be executed directly on the original data source, such as a database or data warehouse.

This capability improves performance, reduces unnecessary data movement, and supports efficient analysis for large or real-time datasets.

Why Intelligent Source Detection Matters

Previously, all datasets needed to be fully ingested into the workspace before analysis could begin. This approach could be inefficient when working with:

  • Large datasets

  • Live enterprise databases

  • Real-time or frequently updated data

  • Data warehouses such as Snowflake

With Intelligent Source Detection, askEdgi removes this limitation by dynamically selecting the most appropriate execution environment.

How Intelligent Query Execution Works

When a user submits a query or analytical request, askEdgi automatically evaluates:

  • Where the relevant data resides

  • Whether execution is more efficient in the workspace or at the source

  • Performance, scale, and execution feasibility

Based on this evaluation, askEdgi chooses one of the following execution paths:

  • Workspace Execution

    • Queries run inside the askEdgi workspace engine when data is already ingested or best suited for in-workspace processing.

  • Source Execution

    • Queries are pushed directly to the original source system, such as a data warehouse, when execution outside the workspace is more efficient.

This ensures faster response times, reduced resource consumption, and improved scalability.

Live Source Query Mode

Live Source Query Mode allows AskEdgi to execute SQL queries directly on supported source systems instead of ingesting data into DuckDB. This enables real-time analytics, minimizes data duplication, and supports environments where data movement is restricted.

When enabled, Live Source becomes the default data querying mode, and all newly added tables are placed under the Live Source section for direct execution.

Connector Configuration - Live Source Checkbox

A Live Source checkbox is available in the AskEdgi settings for supported connectors (Ex: Snowflake).

  • Cached Mode (Default)

    • Tables are ingested into DuckDB

    • Full AI enrichment, transformations, recipes, and cross-source joins supported

  • Live Query Mode

    • SQL executes directly on the source system

    • No data is copied into DuckDB

    • AI enrichment, transformations, and cross-source joins are disabled

    • Only tables from the same live connector can be queried together

Live Connections in the Workspace

When Live Query Mode is enabled:

  • A Live Connections section appears in the workspace

  • Tables added from the source catalog display a Live indicator

  • No ingestion into DuckDB occurs

  • Live tables remain queryable directly on the source

Pinning Rules (Execution)

Rules

Table Type
Pin Allowed

Cached (Imported) Table

✅ Allowed

Live Table

❌ Not Allowed

Live Table Pin Attempt Behavior

A blocking popup is shown:

  • Title: Live Table Execution Not Supported

  • Message:

    • This table is queried directly from the source system and cannot be pinned for execution. To analyze this data using AskEdgi features, move the table to Imported Data.

  • Actions:

    • Move to Imported Data

    • Cancel

Hybrid Execution Blocking (Live + Cached)

If a query references both Live and Cached tables, execution is blocked before SQL generation.

  • System Message (Chat)

    • ⚠️ Mixed Data Sources Detected AskEdgi cannot analyze Live and Imported data together. To continue, move the Live table into Imported Data so both tables run in the same engine.

  • CTA: Move Live Table to Imported Data

Move-to-Cache Workflow

Users may explicitly move a Live table into Cached mode.

Flow

  1. User confirms move

  2. Table is removed from Live Source

  3. Table is ingested into DuckDB

  4. Progress indicator shown

  5. On success — pin becomes available

  6. User is prompted to rerun query

AI & Feature Limitations for Live Tables

Live tables do not support:

  • AI enrichment

  • Transformations

  • Calculated columns

  • DDL / DML operations

  • Cross-connector joins

  • Hybrid execution

Disabled features display tooltips explaining the limitation.

SQL Execution Routing Logic

Scenario
Execution Engine

All tables cached

DuckDB

All tables live (same connector)

Source System

Mixed Live + Imported

❌ Blocked


Copyright © 2025, OvalEdge LLC, Peachtree Corners, GA USA

Last updated

Was this helpful?