Custom Evaluation Criteria
Test specific strings or numbers in agent responses, extending testing capabilities beyond standard expectations like topics and actions.
The Problem
Standard test expectations don't cover all testing needs.
When testing AI agents, teams need to validate:
- 🎯 Action Inputs: "Did the agent pass the correct account ID?"
- 📊 Action Outputs: "Did the API call return success?"
- ⚡ Performance: "Was the response time under 10 seconds?"
- 🔢 Counts: "Were fewer than 5 API calls made?"
- 📝 Data Formats: "Is the phone number formatted correctly?"
- ✉️ Content Validation: "Does the email contain the customer's name?"
In short: You need granular control over what gets tested in agent responses.
How GenAI Explorer Solves This
GenAI Explorer provides custom evaluation criteria with:
✅ Two Evaluation Types
- String comparisons (equals, contains, starts with, ends with)
- Numeric comparisons (equals, greater than, less than, etc.)
✅ JSONPath Expressions
- Extract any value from test results
- Target specific action inputs/outputs
- Access nested data structures
✅ Visual Editor
- Guided dialog for creating evaluations
- JSONPath pattern examples
- Real-time preview of logic
✅ Flexible Assertions
- Test exact matches or partial matches
- Compare ranges and thresholds
- Validate data formats
Custom Evaluation Types
String Comparison
Test text values with four operators:
| Operator | Description | Example |
|---|---|---|
equals | Exact match (case-sensitive) | "Jon" equals "Jon" ✓ |
contains | Substring match | "Hello Jon" contains "Jon" ✓ |
startswith | Prefix match | "Hello World" starts with "Hello" ✓ |
endswith | Suffix match | "example.com" ends with ".com" ✓ |
Numeric Comparison
Test numbers with five operators:
| Operator | Symbol | Description | Example |
|---|---|---|---|
equals | = | Equal to | 10 = 10 ✓ |
greater_than | > | Greater than | 15 > 10 ✓ |
greater_than_or_equal | >= | Greater or equal | 10 >= 10 ✓ |
less_than | < | Less than | 5 < 10 ✓ |
less_than_or_equal | <= | Less or equal | 10 <= 10 ✓ |
How to Add Custom Evaluations
Step 1: Open Test Editor
- Navigate to your AI Evaluation Definition
- Click Edit on a test case
- Scroll to the 🔬 Custom Evaluations section
- Click "Add Custom Evaluation"
Step 2: Configure Evaluation
Example 1: Validate Email Recipient
Evaluation Type: String Comparison
Label: Expected recipient match
Operator: equals
☑ Actual value is a JSONPath reference
Actual Value:
$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.recipient
Expected Value: alice@example.com
Example 2: Check Subject Contains Keyword
Evaluation Type: String Comparison
Label: Subject mentions project
Operator: contains
☑ Actual value is a JSONPath reference
Actual Value:
$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.subject
Expected Value: project
Example 3: Response Time Threshold
Evaluation Type: Numeric Comparison
Label: Response under 10 seconds
Operator: less_than
☑ Actual value is a JSONPath reference
Actual Value:
$.generatedData.latencyMs
Expected Value: 10000
Step 3: Save and Test
- Click "Add Evaluation"
- Review the custom evaluation card
- Click "Save Changes" on the test editor
- Run your test to see results
Understanding JSONPath
JSONPath expressions extract values from the test results JSON structure.
Test Results Structure
When you run a test, the agent generates data like this:
{
"generatedData": {
"invokedActions": [
{
"function": {
"name": "DraftGenericReplyEmail",
"input": {
"recipient": "alice@example.com",
"subject": "Re: Project Update"
},
"result": "Success"
}
}
],
"latencyMs": 2500
}
}
Common JSONPath Patterns
Get Action Input Field
$.generatedData.invokedActions[*][?(@.function.name == 'ActionName')].function.input.fieldName
Explanation:
$.generatedData- Start at root.invokedActions[*]- Look through all actions[?(@.function.name == 'ActionName')]- Filter to specific action.function.input.fieldName- Get the input field
Example:
$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.recipient
Returns: "alice@example.com"
Get Action Output
$.generatedData.invokedActions[*][?(@.function.name == 'ActionName')].function.result
Example:
$.generatedData.invokedActions[*][?(@.function.name == 'CreateCase')].function.result
Returns: "Success"
Get Performance Metric
$.generatedData.latencyMs
Returns: 2500
How Filtering Works
Given this data:
{
"generatedData": {
"invokedActions": [
{
"function": {
"name": "DraftGenericReplyEmail",
"input": { "recipient": "alice@example.com" }
}
},
{
"function": {
"name": "OtherFunction",
"input": { "recipient": "bob@example.com" }
}
}
]
}
}
This expression:
$.generatedData.invokedActions[*][?(@.function.name == 'DraftGenericReplyEmail')].function.input.recipient
Returns: "alice@example.com" (not bob's email)
Why? The filter [?(@.function.name == 'DraftGenericReplyEmail')] only matches the first action.
Real-World Use Cases
1. Validate Action Inputs
Scenario: Ensure correct account ID is passed
Type: String Comparison
Operator: startswith
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'GetAccountInfo')].function.input.accountId
Expected: 001
2. Verify Action Success
Scenario: Check case creation succeeded
Type: String Comparison
Operator: contains
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'CreateCase')].function.result
Expected: Success
3. Performance Testing
Scenario: Response within 5 seconds
Type: Numeric Comparison
Operator: less_than_or_equal
Actual: $.generatedData.latencyMs
Expected: 5000
4. API Call Limits
Scenario: Maximum 3 external calls
Type: Numeric Comparison
Operator: less_than_or_equal
Actual: $.generatedData.apiCallCount
Expected: 3
5. Data Format Validation
Scenario: Phone number has country code
Type: String Comparison
Operator: startswith
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'FormatPhoneNumber')].function.result
Expected: +1
6. Content Requirements
Scenario: Email mentions customer name
Type: String Comparison
Operator: contains
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'DraftEmail')].function.input.body
Expected: {{customerName}}
Test Results Display
Custom evaluations appear as purple columns (🔬) in the test cases table:
Pass (✓):
- Green checkmark
- Shows comparison details
- Displays "10 < 10000" or "text contains 'keyword'"
Fail (✗):
- Red error icon
- Shows why it failed
- Displays "alice ≠ bob" or "15 ≥ 10"
No Data (⚠):
- Orange warning
- "No test session found to evaluate"
- Run the test to get actual values
Best Practices
1. Use Descriptive Labels
✅ Good:
- "Expected recipient match"
- "Subject contains project reference"
- "Response time under 10 seconds"
❌ Bad:
- "Test 1"
- "Check"
- "Custom eval"
2. Validate JSONPath First
Before using complex expressions:
- Run test with results
- Inspect the
generatedDatastructure - Test your JSONPath against actual data
- Verify it returns the expected value
3. Remember Case Sensitivity
String operators are case-sensitive:
"Jon" equals "jon" ❌ Does not match
"Jon" equals "Jon" ✅ Matches
For case-insensitive matching, use contains with known case.
4. Start Simple, Then Expand
- First: Test basic fields (action names, simple inputs)
- Then: Add nested field comparisons
- Finally: Use complex filters and conditions
5. Character Limit
Each parameter field is limited to 100 characters. For long JSONPath:
- Break down complex paths
- Use shorter action names if possible
- Consider a different data structure
Troubleshooting
JSONPath Returns No Data
Problem: Custom evaluation shows "No actual value found"
Solutions:
- ✓ Verify the action name is correct (case-sensitive)
- ✓ Check that the field path exists in generated data
- ✓ Ensure
isReferencecheckbox is checked for JSONPath - ✓ Run test to generate fresh data
String Comparison Fails Unexpectedly
Problem: Strings that look the same don't match
Solutions:
- ✓ Check for extra whitespace (leading/trailing spaces)
- ✓ Verify case sensitivity ("Jon" vs "jon")
- ✓ Look for special characters
- ✓ Use
containsinstead ofequalsfor partial match
Numeric Comparison Issues
Problem: Numbers don't compare correctly
Solutions:
- ✓ Verify value is a number, not a string
- ✓ Check decimal places (10.0 vs 10)
- ✓ Ensure expected value is numeric
- ✓ Use appropriate operator (>= vs >)
Custom Evaluation Not Visible
Problem: Evaluation doesn't show in results
Solutions:
- ✓ Save the test case after adding evaluation
- ✓ Verify metadata is properly formatted
- ✓ Refresh the page
- ✓ Check that column isn't hidden
Advanced Patterns
Multiple Conditions
To test multiple values from the same action:
Evaluation 1: Check recipient
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'SendEmail')].function.input.to
Expected: alice@example.com
Evaluation 2: Check subject
Actual: $.generatedData.invokedActions[*][?(@.function.name == 'SendEmail')].function.input.subject
Expected: Project Update
Array Elements
Access specific array items:
$.generatedData.invokedActions[*][?(@.function.name == 'ActionName')].additionalContext[0].value
The [0] gets the first context item.
Nested Properties
Navigate deep structures:
$.generatedData.invokedActions[*][?(@.function.name == 'ComplexAction')].function.input.metadata.customField.value
Quick Reference
| Need | Type | Operator | Example Use |
|---|---|---|---|
| Exact match | String | equals | Verify recipient |
| Contains text | String | contains | Check keywords |
| Starts with | String | startswith | Validate prefix |
| Ends with | String | endswith | Check domain |
| Performance | Numeric | < or <= | Response time |
| Limits | Numeric | <= | API call count |
| Thresholds | Numeric | > or >= | Minimum value |
| Ranges | Numeric | Multiple evals | Min and max |
Additional Resources
Related Features
- Adding Test Cases - Create comprehensive test suites
- Test Case Editing - Modify and refine tests
- Conversation History - Convert interactions to tests