Skip to main content

Einstein Model Testing

Test and compare Einstein AI models directly from your browser with custom prompts, parameter control, sustainability metrics, and real-time cost analysis.

Model Testing

The Problem

Choosing the right AI model for your use case requires understanding the tradeoffs between cost, speed, quality, and environmental impact.

When building AI solutions, teams need to:

  • 🎯 Model Selection: Determine which model provides the best balance for specific use cases
  • 💰 Cost Optimization: Understand token consumption and pricing implications
  • 🌍 Sustainability: Monitor CO₂ emissions and water consumption
  • Performance Testing: Compare response times across different models
  • 🎨 Quality Assessment: Evaluate output quality for your specific prompts
  • 🔧 Parameter Tuning: Experiment with temperature, max tokens, and other settings
  • 📊 Side-by-Side Comparison: Test multiple models with identical inputs

In short: You need a sandbox to experiment with different models and parameters before committing to production.


How GenAI Explorer Solves This

GenAI Explorer provides comprehensive model testing with:

Side-by-Side Comparison: Test multiple models with identical prompts simultaneously

20+ Einstein Models: Compare across providers

  • OpenAI (GPT-4o, GPT-4.1, GPT-5, O3, O4 Mini)
  • Anthropic (Claude Sonnet 4.5, Claude 3 Haiku)
  • Google (Gemini 2.5 Pro, Gemini 2.5 Flash)
  • Amazon (Nova Pro, Nova Lite)

Sustainability Metrics: Real-time environmental impact tracking

  • CO₂ emissions per request (grams)
  • Water consumption (liters)
  • Relatable equivalents (car km, smartphone charges)
  • Sustainability ratings (A+ to D)

Cost Transparency: See token usage and estimated costs in real-time

Parameter Control: Adjust and understand key settings

  • Temperature (creativity vs consistency)
  • Max tokens (response length limits)

Sample Prompts Library: Pre-built prompts for common scenarios

Prompt History: Automatically saves all your tests

  • Access previous prompts instantly
  • Re-run past tests with one click
  • Compare results across time

Impact: Choose the right model for each use case, reduce costs by 50-70% with smarter model selection, reduce CO₂ emissions by up to 95%, and validate quality before deployment.


Quick Start Guide

1. Access Model Testing

Navigate to Einstein Model Testing from the main menu.

2. Select Models to Compare

Choose one or more models from 20+ available options:

By Provider:

  • OpenAI/Azure: GPT-4o, GPT-4o Mini, GPT-4.1, GPT-5, O3, O4 Mini
  • Anthropic (AWS): Claude Sonnet 4.5, Claude 4, Claude 3 Haiku
  • Google: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash
  • Amazon: Nova Pro, Nova Lite

Pro Tip: Select multiple models to compare results side-by-side.

3. Enter Your Prompt

Type or paste your prompt in the text area.

Try a sample prompt:

Explain quantum computing in simple terms for a 10-year-old.

4. Generate & Compare

Click Generate to see results from all selected models side-by-side with:

  • Response content
  • Response time
  • Token usage
  • Cost estimate
  • CO₂ emissions
  • Water consumption
  • Sustainability rating

Key Features

Side-by-Side Comparison

Compare multiple models with identical inputs:

┌─────────────────────────┬─────────────────────────┬─────────────────────────┐
│ GPT-4o Mini │ Claude 3 Haiku │ Gemini 2.5 Flash │
├─────────────────────────┼─────────────────────────┼─────────────────────────┤
│ Response Time: 1.8s │ Response Time: 0.9s │ Response Time: 1.2s │
│ Tokens: 285 │ Tokens: 198 │ Tokens: 220 │
│ Cost: $0.0004 │ Cost: $0.0002 │ Cost: $0.0006 │
│ CO₂: 0.18g │ CO₂: 0.13g │ CO₂: 0.12g │
│ 🟢 A+ Sustainability │ 🟢 A+ Sustainability │ 🟢 A+ Sustainability │
│ │ │ │
│ [Response content] │ [Response content] │ [Response content] │
└─────────────────────────┴─────────────────────────┴─────────────────────────┘

Metrics displayed:

  • Response time
  • Token usage (input + output)
  • Cost estimate (USD)
  • CO₂ emissions (grams)
  • Water consumption (liters)
  • Sustainability rating (A+ to D)
  • Relatable equivalents

Sustainability Tracking

Real-time environmental impact monitoring:

  • CO₂ Emissions: Measured in grams per request
  • Water Consumption: Measured in liters per request
  • Relatable Equivalents:
    • Car kilometers driven
    • Smartphone charges
    • Glasses of water
    • Tree absorption days

Sustainability Ratings:

  • A+: Most efficient (top 20%)
  • A: Very efficient
  • B: Moderate efficiency
  • C: Higher impact
  • D: Highest impact

Prompt History

Automatically saves every test you run - never lose your work!

  • 📜 Browse History: View all previous prompts and results
  • 🔄 One-Click Re-run: Test again with saved settings
  • 📊 Compare Results: See how responses changed over time
  • 💾 Export Data: Download for analysis or reporting

Model Categories at a Glance

🚀 High Performance (Complex Tasks)

ModelCost/1k tokensCO₂/1k tokensRating
GPT-5$0.02013.78gD
GPT-4.1$0.0120.56gA+
Gemini 2.5 Pro$0.0101.54gA
Claude Sonnet 4.5$0.0181.20gA

⚡ Balanced (General Purpose)

ModelCost/1k tokensCO₂/1k tokensRating
GPT-4o Mini$0.00150.64gA+
GPT-4.1 Mini$0.0020.59gA+
Gemini 2.5 Flash$0.00250.56gA+
Claude 3.7 Sonnet$0.0151.18gA

💰 Cost-Efficient (High Volume)

ModelCost/1k tokensCO₂/1k tokensRating
Claude 3 Haiku$0.00080.64gA+
Amazon Nova Lite$0.00050.10gA+
Gemini 2.0 Flash Lite$0.00070.12gA+

Common Use Cases

1. Model Selection for Production

Goal: Choose the right model for your production use case

Process:

  1. Test your actual use case prompts with relevant models
  2. Compare quality, speed, cost, and sustainability
  3. Run multiple variations to test consistency
  4. Choose the model that best balances your priorities

Example Decision:

  • GPT-5 for complex legal document analysis (quality critical)
  • GPT-4o Mini for general customer support (balanced)
  • Claude 3 Haiku for simple data classification (high volume)

2. Sustainability Optimization

Goal: Reduce environmental impact while maintaining quality

Process:

  1. Identify current model usage
  2. Test alternatives with A+ sustainability ratings
  3. Compare quality differences
  4. Switch to more efficient models where quality is acceptable

Example Results:

  • Switching from GPT-5 to GPT-4.1 Mini:
    • 96% reduction in CO₂ emissions
    • 90% cost savings
    • Quality still excellent for most tasks

3. Cost vs Quality Analysis

Goal: Determine if premium models justify their cost

Example Results (10,000 requests/month):

ModelMonthly CostMonthly CO₂Quality
GPT-5$200137.8 kg98%
GPT-4o Mini$156.4 kg92%
Claude 3 Haiku$86.4 kg88%

Decision: Use GPT-4o Mini for 90% of requests, GPT-5 for complex cases only.


Documentation

Guides


Next Steps

  1. Start Simple: Begin with a basic prompt and compare 2-3 models
  2. Check Sustainability: Note the CO₂ and cost metrics
  3. Test Multiple Models: Find the best balance for your use case
  4. Monitor Impact: Track your cumulative environmental footprint
  5. Deploy Wisely: Use insights to configure production with sustainability in mind

Effective model testing leads to better AI implementations, significant cost savings, and reduced environmental impact.