Skip to main content

AI Retriever & RAG Monitoring

Monitor every knowledge retrieval your AI agents make with complete visibility into what content was found, how relevant it was, and where it came from.

The Problem

Understanding why AI agents give certain answers requires seeing what knowledge they retrieved and how they chose it.

Teams need to:

  • 🔍 Debug Wrong Answers: See exactly what content the AI retrieved when it gave incorrect information
  • 📊 Measure Quality: Know if the right content is being found with good relevance scores
  • Optimize Performance: Identify slow retrievals and optimize response times
  • 📚 Find Content Gaps: Discover what knowledge is missing from your knowledge base
  • 🎯 Validate Sources: Trace retrieved content back to source Salesforce records
  • 🔬 A/B Test Retrievers: Compare different retriever configurations objectively

In short: You need complete transparency into your RAG (Retrieval-Augmented Generation) system to build reliable AI agents.

How GenAI Explorer Solves This

GenAI Explorer provides complete RAG visibility with:

Request Timeline: See every query your agents make

  • Query text and context
  • When requests were made
  • Number of results returned
  • Average response times

Response Ranking: Results sorted by relevance

  • Best match highlighted automatically
  • Relevance scores as percentages
  • Quick visual scanning

Execution Time Tracking: Monitor performance

  • Individual response times
  • Average time per request
  • Identify slow queries

Source Tracing: Know where content comes from

  • Clickable links to source records
  • Data source identification
  • Object type information

Content Preview: See what AI actually saw

  • First 2 lines of retrieved text
  • Clean truncation
  • Quick verification

Quality Metrics: Measure retrieval effectiveness

  • Score distribution analysis
  • Performance trends
  • Content gap identification

Impact: Debug agent issues in minutes, optimize retrieval quality with data, and build confidence in AI responses with source tracing.

How It Works

Features

Request Timeline

Every retrieval request is captured with:

┌─────────────────────────────────────────────────────────────┐
│ 📋 Request: "How do I reset my password?" │
│ 🕐 Dec 9, 2025, 10:30:45 AM (5 minutes ago) │
│ 📊 3 results | ⏱️ Avg: 1.23s │
└─────────────────────────────────────────────────────────────┘

Best Match Highlighting

The top-ranked result gets special visual treatment:

Score Interpretation

Performance Tracking

Use Cases

Use Case 1: Debugging Wrong Answers

Steps:

  1. Open GenAI Explorer and navigate to your Retriever
  2. Find the specific query that led to the wrong answer
  3. Check what content was actually retrieved
  4. Verify the relevance score
  5. Click through to source records
  6. Identify the issue and fix it

Use Case 2: Measuring Quality

Track quality metrics over time:

Before GenAI Explorer:

❓ Is our RAG working well?
❓ How good is the knowledge base?
❓ Should we add more data?

After GenAI Explorer:

✅ 85% of queries get >90% score
⚠️ 10% of queries get <70% score
📝 Action: Add content for low-scoring topics

Use Case 3: Performance Optimization

Use Case 4: Content Gap Analysis

Identify missing knowledge by analyzing low-scoring queries:

Action Plan:

  1. Export low-scoring queries
  2. Group by topic
  3. Create missing content
  4. Measure improvement

Use Case 5: A/B Testing

Compare two retriever configurations:

MetricRetriever ARetriever BWinner
Avg Score87.3%92.1%🏆 B
Avg Time1.8s1.2s🏆 B
Avg Results3.24.5🏆 B

Decision: Deploy Retriever B based on objective metrics

Workflow

Best Practices

Do's and Don'ts

Do:

  • Monitor quality weekly
  • Track score trends
  • Verify source records
  • Test changes before deploying
  • Set performance baselines

Don't:

  • Ignore low scores (<70%)
  • Assume high score = correct answer
  • Forget to check source records
  • Let execution time grow unchecked
  • Make changes without testing

Success Metrics

Recommended Targets:

  • Quality: >80% relevance score on 85% of queries
  • Performance: <2s execution time on 95% of responses
  • Coverage: 3+ relevant results on 90% of requests

Weekly Dashboard Example

┌─────────────────────────────────────┐
│ 📊 This Week's Performance │
├─────────────────────────────────────┤
│ Queries: 1,247 │
│ Avg Score: 87.3% ✅ │
│ Avg Time: 1.45s ✅ │
│ Low Scores: 78 (6.2%) ⚠️ │
└─────────────────────────────────────┘

Getting Started

GenAI Explorer → Data → Search "Retriever" → Select Your Retriever

Key Visual Indicators

VisualMeaning
🏆 Green borderBest match result
>90% scoreExcellent retrieval
80-90% scoreGood retrieval
<70% scoreReview needed
<1s timeFast performance
>2s timeSlow, needs optimization

Quick Actions

Data Cloud Objects

AIRetrieverRequest__dll

Stores each retrieval request:

  • id__c - Unique request ID
  • queryText__c - The search query
  • requestTimestamp__c - When requested
  • retrieverApiName__c - Which retriever was used

AIRetrieverResponse__dll

Stores each retrieved result:

  • id__c - Unique response ID
  • resultText__c - Retrieved content (JSON)
  • scoreNumber__c - Relevance score (0-1)
  • responseTimestamp__c - When returned
  • aiRetrieverRequestId__c - Links to request

FAQs

Q: How long is retrieval data stored?
A: Data Cloud retention policies apply (typically 30-90 days).

Q: Can I export retrieval data?
A: Yes, use the Data Cloud Integration feature to query and export.

Q: What's a good relevance score?
A: >80% is good, >90% is excellent. Below 70% needs investigation.

Q: How do I improve low scores?
A: Add more relevant content to your knowledge base or update existing content.

Q: Can I compare retrievers?
A: Yes, run the same queries through different retrievers and compare metrics.


Related Features: