Retrieval-Augmented Generation (RAG)

Introduction

Retrieval Augmented Generation (RAG) in askEdgi refers to answering questions by first retrieving trusted enterprise context and then generating responses based on that context.

askEdgi is not a generic AI chatbot.

It does not rely on assumptions or general knowledge.

Instead, askEdgi:

Understands business terms defined by the organization
Knows what data exists and how it is governed
Considers structural relationships between datasets
Uses metadata, lineage, and contextual statistics
Suggests relevant and trusted assets before execution
Explains answers using enterprise context

This approach ensures that responses are grounded in how the organization understands and manages its data.

Business Value of RAG

Business users frequently need answers to questions such as:

Which dataset should be used for a metric?
What does a business term mean?
Where does a number originate?
Why do different reports show different values?
Which datasets should be combined?

These questions require understanding business meaning, structural relationships, and governance alignment.

Retrieval Augmented Generation ensures:

Accurate answers
Consistent interpretation
Connected datasets
Trustworthy analysis

Enterprise Context Used by RAG

The enterprise context retrieved by askEdgi includes the following elements.

Context Type

Description

Business glossary

Approved business definitions and terminology

Curated datasets

Trusted and governed data assets

Governance information

Ownership, classification, and access controls

Metadata and documentation

Business and technical descriptions

Dataset relationships

Structural connections between assets

Lineage information

Data movement and dependency paths

Contextual statistics

Sample characteristics and value distributions

Top values

Frequently occurring values used for interpretation

Note: Sample statistics and top value summaries improve interpretation but remain secondary to governed metadata and definitions.

askEdgi Modes and the Role of RAG

askEdgi operates in two clearly defined modes to support different user needs. Each mode has a clear purpose and boundary that ensures trust and predictability.

Analysis Mode

Analysis Mode is the default and primary mode. It supports the complete journey from understanding a question to generating insights.

Purpose of Analysis Mode

Analysis Mode supports the following activities:

Understanding a question
Identifying the correct data
Validating how datasets relate
Performing analysis
Receiving business-aligned explanations

Analysis Mode supports the complete journey from discovery to insight without switching between guidance and execution modes.

Understand RAG Usage in Analysis Mode

In Analysis Mode, Retrieval Augmented Generation performs the following functions:

Interpret the business meaning behind a question
Retrieve relevant glossary definitions
Surface business and technical descriptions of assets
Evaluate asset metadata and documentation
Analyze relationships between datasets
Eliminate unrelated or disconnected tables
Confirm that selected datasets combine correctly

Retrieval Augmented Generation ensures reasoning in a business context before execution begins.

Understand Relationship Aware Intelligence

When a question spans multiple datasets, askEdgi performs structural validation.

askEdgi performs the following actions:

Confirm that selected assets are structurally connected
Avoid a combination of unrelated datasets
Suggest additional related assets only when required
Limit context expansion to what is necessary

This prevents:

Incorrect joins
Over-selection of irrelevant tables
Misleading analysis
Loss of trust

Datasets are validated as part of a connected data ecosystem rather than isolated objects.

Understand Workspace First Execution

Analysis Mode respects the Workspace as the execution boundary.

Execution rules are as follows:

If tables remain pinned, analysis is restricted to pinned tables
If tables are not pinned, eligible workspace tables are considered
Additional catalog assets are surfaced only when necessary
Data outside the intended scope is not analyzed

This ensures controlled execution.

Discovery Mode

Discovery Mode supports structured exploration of the Data Catalog. This mode does not use RAG and does not perform analysis execution.

Purpose of Discovery Mode

Discovery Mode supports:

Asset browsing
Data availability validation
Metadata understanding
Documentation review

Understand Discovery Mode Behavior

In Discovery Mode, askEdgi performs the following actions:

Retrieve assets from the catalog
Surface business descriptions and technical documentation
Apply governance-aware filters
Return metadata and definitions

RAG-based reasoning does not occur in this mode. Discovery Mode provides clarity without execution.

Important: Discovery Mode does not use RAG. Only Analysis Mode uses RAG.

How askEdgi Finds the Right Context in Analysis Mode

The following sequence describes how askEdgi retrieves context and prepares for execution.

Step 1: Determine Workspace Dependency

askEdgi evaluates whether the request requires existing workspace data.

Workspace data is required when the request:

Reads existing tables
Computes metrics from data
References workspace objects
Validates schemas

Workspace data is not required when the request:

Requests an example
Requests a sample SQL or Python
Requires logical reasoning without data
Creates new structures without referencing existing data

This separation improves clarity and efficiency.

Step 2: Evaluate Existing Workspace Context

askEdgi checks whether sufficient context already exists within the Workspace.

If sufficient context exists, search expansion does not occur.

Step 3: Enrich Business Understanding

When additional clarity is required, askEdgi retrieves:

Glossary definitions
Asset descriptions
Metadata context

This ensures the correct interpretation of business intent.

Step 4: Suggest Relevant Assets

When necessary, askEdgi identifies additional datasets aligned with business intent.

Only governed and relevant assets are considered.

Step 5: Validate Dataset Compatibility

Before execution, askEdgi confirms:

Required attributes exist
Datasets are structurally connected
Necessary relationships are available
Required elements are complete

If validation fails, execution does not proceed.

Important: askEdgi stops and requests clarification instead of executing with incomplete data.

Step 6: Execute Analysis

Execution occurs only after:

Context is sufficient
Relationships are confirmed
Required data elements exist

Execution is intentional and validated.

How askEdgi Handles Missing or Incomplete Information

askEdgi stops intentionally to ensure accurate and trustworthy results.

askEdgi stops when:

Required data is missing
Structural compatibility cannot be confirmed
Business context is unclear
Confidence is insufficient

askEdgi does not:

Guess
Partially execute
Assume schema

This results in predictable and trustworthy outcomes.

RAG Trust in Analysis Mode

The RAG framework in askEdgi relies on controlled enterprise grounding.

RAG uses the following information sources:

Curated business descriptions
Technical documentation
Structured metadata
Relationship context between assets
Lineage information
Contextual data statistics, such as sample data characteristics
Top 50 values

Note: Contextual data statistics and top 50 values improve relevance and interpretation. Governed metadata and business definitions remain the primary reference.

askEdgi enforces the following controls:

Respect governance and access rules
Validate structural compatibility before dataset combination
Confirm required attributes before execution
Stop when information remains incomplete
Maintain clear separation between discovery and execution

This layered grounding ensures business-aligned, structurally valid, and explainable responses.

RAG Limitations

Certain behaviors remain intentionally restricted to maintain trust.

askEdgi avoids:

Guessing or fabricating answers
Ignoring governance controls
Combining unrelated datasets
Excessive expansion across the data ecosystem
Execution with incomplete schema validation

Restraint remains a core system principle.

Business Impact

Organizations gain the following benefits:

Faster and safer data discovery
Reduced dependency on technical teams
Fewer incorrect dataset combinations
Strong structural validation before analysis
Higher trust in analytics and reporting
Streamlined workflows without mode confusion
Better alignment between business and data teams

askEdgi serves as a reliable entry point to enterprise data knowledge and trusted analysis.

Summary

RAG forms the foundation that makes askEdgi:

Context-aware instead of generic
Relationship-aware instead of isolated
Schema-aware instead of assumptive
Dependency-aware instead of speculative
Trusted instead of uncertain
Business aligned instead of technically driven

Clear separation between Analysis Mode and Discovery Mode with structured validation before execution ensures intentional, explainable, and trustworthy interactions.

PreviousGetting Started Guide NextRecipe User Guide

Last updated 15 hours ago

Was this helpful?

hashtagIntroduction

hashtagBusiness Value of RAG

hashtagEnterprise Context Used by RAG

hashtagaskEdgi Modes and the Role of RAG

hashtagAnalysis Mode

hashtagPurpose of Analysis Mode

hashtagUnderstand RAG Usage in Analysis Mode

hashtagUnderstand Relationship Aware Intelligence

hashtagUnderstand Workspace First Execution

hashtagDiscovery Mode

hashtagPurpose of Discovery Mode

hashtagUnderstand Discovery Mode Behavior

hashtagHow askEdgi Finds the Right Context in Analysis Mode

hashtagStep 1: Determine Workspace Dependency

hashtagStep 2: Evaluate Existing Workspace Context

hashtagStep 3: Enrich Business Understanding

hashtagStep 4: Suggest Relevant Assets

hashtagStep 5: Validate Dataset Compatibility

hashtagStep 6: Execute Analysis

hashtagHow askEdgi Handles Missing or Incomplete Information

hashtagRAG Trust in Analysis Mode

hashtagRAG Limitations

hashtagBusiness Impact

hashtagSummary