AI Retriever & RAG Monitoring
Monitor every knowledge retrieval your AI agents make with complete visibility into what content was found, how relevant it was, and where it came from.
The Problem
Understanding why AI agents give certain answers requires seeing what knowledge they retrieved and how they chose it.
Teams need to:
- 🔍 Debug Wrong Answers: See exactly what content the AI retrieved when it gave incorrect information
- 📊 Measure Quality: Know if the right content is being found with good relevance scores
- ⚡ Optimize Performance: Identify slow retrievals and optimize response times
- 📚 Find Content Gaps: Discover what knowledge is missing from your knowledge base
- 🎯 Validate Sources: Trace retrieved content back to source Salesforce records
- 🔬 A/B Test Retrievers: Compare different retriever configurations objectively
In short: You need complete transparency into your RAG (Retrieval-Augmented Generation) system to build reliable AI agents.
How GenAI Explorer Solves This
GenAI Explorer provides complete RAG visibility with:
✅ Request Timeline: See every query your agents make
- Query text and context
- When requests were made
- Number of results returned
- Average response times
✅ Response Ranking: Results sorted by relevance
- Best match highlighted automatically
- Relevance scores as percentages
- Quick visual scanning
✅ Execution Time Tracking: Monitor performance
- Individual response times
- Average time per request
- Identify slow queries
✅ Source Tracing: Know where content comes from
- Clickable links to source records
- Data source identification
- Object type information
✅ Content Preview: See what AI actually saw
- First 2 lines of retrieved text
- Clean truncation
- Quick verification
✅ Quality Metrics: Measure retrieval effectiveness
- Score distribution analysis
- Performance trends
- Content gap identification
Impact: Debug agent issues in minutes, optimize retrieval quality with data, and build confidence in AI responses with source tracing.
How It Works
Features
Request Timeline
Every retrieval request is captured with:
┌─────────────────────────────────────────────────────────────┐
│ 📋 Request: "How do I reset my password?" │
│ 🕐 Dec 9, 2025, 10:30:45 AM (5 minutes ago) │
│ 📊 3 results | ⏱️ Avg: 1.23s │
└─────────────────────────────────────────────────────────────┘
Best Match Highlighting
The top-ranked result gets special visual treatment:
Score Interpretation
Performance Tracking
Use Cases
Use Case 1: Debugging Wrong Answers
Steps:
- Open GenAI Explorer and navigate to your Retriever
- Find the specific query that led to the wrong answer
- Check what content was actually retrieved
- Verify the relevance score
- Click through to source records
- Identify the issue and fix it
Use Case 2: Measuring Quality
Track quality metrics over time:
Before GenAI Explorer:
❓ Is our RAG working well?
❓ How good is the knowledge base?
❓ Should we add more data?
After GenAI Explorer:
✅ 85% of queries get >90% score
⚠️ 10% of queries get <70% score
📝 Action: Add content for low-scoring topics
Use Case 3: Performance Optimization
Use Case 4: Content Gap Analysis
Identify missing knowledge by analyzing low-scoring queries:
Action Plan:
- Export low-scoring queries
- Group by topic
- Create missing content
- Measure improvement
Use Case 5: A/B Testing
Compare two retriever configurations:
| Metric | Retriever A | Retriever B | Winner |
|---|---|---|---|
| Avg Score | 87.3% | 92.1% | 🏆 B |
| Avg Time | 1.8s | 1.2s | 🏆 B |
| Avg Results | 3.2 | 4.5 | 🏆 B |
Decision: Deploy Retriever B based on objective metrics
Workflow
Best Practices
Do's and Don'ts
✅ Do:
- Monitor quality weekly
- Track score trends
- Verify source records
- Test changes before deploying
- Set performance baselines
❌ Don't:
- Ignore low scores (<70%)
- Assume high score = correct answer
- Forget to check source records
- Let execution time grow unchecked
- Make changes without testing
Success Metrics
Recommended Targets:
- Quality: >80% relevance score on 85% of queries
- Performance: <2s execution time on 95% of responses
- Coverage: 3+ relevant results on 90% of requests
Weekly Dashboard Example
┌─────────────────────────────────────┐
│ 📊 This Week's Performance │
├─────────────────────────────────────┤
│ Queries: 1,247 │
│ Avg Score: 87.3% ✅ │
│ Avg Time: 1.45s ✅ │
│ Low Scores: 78 (6.2%) ⚠️ │
└─────────────────────────────────────┘
Getting Started
Navigation
GenAI Explorer → Data → Search "Retriever" → Select Your Retriever
Key Visual Indicators
| Visual | Meaning |
|---|---|
| 🏆 Green border | Best match result |
| >90% score | Excellent retrieval |
| 80-90% score | Good retrieval |
| <70% score | Review needed |
| <1s time | Fast performance |
| >2s time | Slow, needs optimization |
Quick Actions
Data Cloud Objects
AIRetrieverRequest__dll
Stores each retrieval request:
id__c- Unique request IDqueryText__c- The search queryrequestTimestamp__c- When requestedretrieverApiName__c- Which retriever was used
AIRetrieverResponse__dll
Stores each retrieved result:
id__c- Unique response IDresultText__c- Retrieved content (JSON)scoreNumber__c- Relevance score (0-1)responseTimestamp__c- When returnedaiRetrieverRequestId__c- Links to request
FAQs
Q: How long is retrieval data stored?
A: Data Cloud retention policies apply (typically 30-90 days).
Q: Can I export retrieval data?
A: Yes, use the Data Cloud Integration feature to query and export.
Q: What's a good relevance score?
A: >80% is good, >90% is excellent. Below 70% needs investigation.
Q: How do I improve low scores?
A: Add more relevant content to your knowledge base or update existing content.
Q: Can I compare retrievers?
A: Yes, run the same queries through different retrievers and compare metrics.
Related Features: