Cost Optimization
Strategies to reduce AI costs by 50-95% while maintaining quality and reducing environmental impact.
Understanding Costs
Current Einstein Model Pricing
| Model | Cost per 1k Tokens | CO₂ per 1k Tokens | Rating |
|---|---|---|---|
| GPT-5 | $0.020 | 13.78g | D |
| GPT-5.1 (Beta) | $0.025 | 13.78g | D |
| Claude Sonnet 4.5 | $0.018 | 1.20g | A |
| GPT-4.1 | $0.012 | 0.56g | A+ |
| GPT-4o | $0.010 | 1.17g | A |
| Gemini 2.5 Pro | $0.010 | 1.54g | A |
| O3 (Beta) | $0.006 | 0.99g | A |
| GPT-5 Mini | $0.005 | 7.75g | B |
| Amazon Nova Pro | $0.003 | 0.50g | A+ |
| Gemini 2.5 Flash | $0.0025 | 0.56g | A+ |
| GPT-4.1 Mini | $0.002 | 0.59g | A+ |
| Gemini 2.0 Flash | $0.002 | 0.45g | A+ |
| O4 Mini (Beta) | $0.002 | 5.13g | B |
| GPT-4o Mini | $0.0015 | 0.64g | A+ |
| Claude Haiku 4.5 | $0.001 | 0.78g | A+ |
| Claude 3 Haiku | $0.0008 | 0.64g | A+ |
| Gemini 2.5 Flash Lite | $0.0008 | 0.15g | A+ |
| Gemini 2.0 Flash Lite | $0.0007 | 0.12g | A+ |
| Amazon Nova Lite | $0.0005 | 0.10g | A+ |
The Cost-Sustainability Correlation
Key Insight: Cheaper models are almost always more sustainable.
Cost ↓ = CO₂ ↓ = Water ↓
Optimizing for cost typically also optimizes for sustainability!
Strategy 1: Right-Size Your Model
The Problem
Using GPT-5 for everything = 💰💰💰 + 🌍📈
10,000 requests/month × 500 tokens × $0.020/1k = $100/month 10,000 requests/month × 500 tokens × 13.78g/1k = 69 kg CO₂/month
The Solution: Smart Model Routing
function selectModel(request) {
const complexity = analyzeComplexity(request);
switch(complexity) {
case 'CRITICAL':
return 'GPT-4.1'; // 5% - Quality + Sustainability
case 'COMPLEX':
return 'GPT-4o-Mini'; // 15% - Balanced
case 'STANDARD':
return 'Claude-3-Haiku'; // 50% - Fast & cheap
case 'SIMPLE':
return 'Amazon-Nova-Lite'; // 30% - Lowest cost
}
}
Savings Calculation
| Scenario | Cost/Month | CO₂/Month |
|---|---|---|
| Before (all GPT-5) | $100 | 69 kg |
| After (smart routing) | $8 | 3.5 kg |
| Savings | 92% | 95% |
Detailed breakdown (10,000 requests, 500 tokens avg):
| Model | % Requests | Monthly Cost | Monthly CO₂ |
|---|---|---|---|
| GPT-4.1 (5%) | 500 | $3.00 | 0.14 kg |
| GPT-4o Mini (15%) | 1,500 | $1.13 | 0.48 kg |
| Claude 3 Haiku (50%) | 5,000 | $2.00 | 1.60 kg |
| Amazon Nova Lite (30%) | 3,000 | $0.75 | 0.15 kg |
| Total | 10,000 | $6.88 | 2.37 kg |
Strategy 2: Optimize Token Usage
Reduce Input Tokens
❌ Verbose Prompts (50 tokens):
I would really appreciate it if you could help me by analyzing
this customer feedback and providing me with a detailed summary
of the main points that the customer is trying to communicate.
Here is the feedback: "{feedback}"
✅ Concise Prompts (8 tokens):
Summarize this customer feedback:
"{feedback}"
Savings: 84% fewer input tokens
Limit Output Tokens
❌ No Limit:
{
prompt: "Explain AI",
maxTokens: null // Could generate 2000+ tokens
}
Average: 800 tokens → $0.0012 with GPT-4o Mini
✅ Set Appropriate Limit:
{
prompt: "Explain AI in 2 sentences",
maxTokens: 100
}
Average: 80 tokens → $0.00012 with GPT-4o Mini
Savings: 90% fewer output tokens
Use Structured Output
❌ Free-form (800 tokens):
"Analyze this data and give me insights"
✅ Structured (50 tokens):
"Analyze this data. Format:
- Trend: [one sentence]
- Action: [one sentence]"
Savings: 94% reduction
Strategy 3: Model Selection by Use Case
Customer Support
| Approach | Model | Cost/Request | Quality |
|---|---|---|---|
| Premium | GPT-4.1 | $0.006 | 98% |
| Balanced | GPT-4o Mini | $0.00075 | 94% |
| Budget | Claude 3 Haiku | $0.0004 | 90% |
Recommendation: Start with Claude 3 Haiku, escalate to GPT-4o Mini for complex issues.
Data Classification
| Approach | Model | Cost/10k items | Accuracy |
|---|---|---|---|
| Overkill | GPT-5 | $100 | 99% |
| Optimal | GPT-4o Mini | $7.50 | 97% |
| Budget | Amazon Nova Lite | $2.50 | 94% |
Recommendation: GPT-4o Mini for best accuracy/cost balance.
Content Generation
| Approach | Model | Cost/Article | Quality |
|---|---|---|---|
| Premium | Claude Sonnet 4.5 | $0.036 | Excellent |
| Balanced | Gemini 2.5 Flash | $0.005 | Good |
| Budget | GPT-4o Mini | $0.003 | Good |
Recommendation: Gemini 2.5 Flash for content at scale.
Strategy 4: Caching & Batching
Cache Common Responses
// Check cache first
const cachedResponse = cache.get(promptHash);
if (cachedResponse) {
return cachedResponse; // $0 cost, 0 CO₂!
}
// Only call API if not cached
const response = await callAPI(prompt);
cache.set(promptHash, response, ttl=3600);
Example Savings:
- 10,000 requests/month
- 30% cache hit rate
- 3,000 cached × $0.0015 saved = $4.50/month saved
- 3,000 cached × 0.64g saved = 1.9 kg CO₂ avoided
Batch Similar Requests
❌ Individual (100 API calls):
for (const item of items) {
await processItem(item);
}
// Cost: 100 × $0.0015 = $0.15
// CO₂: 100 × 0.32g = 32g
✅ Batched (1 API call):
await processBatch(items);
// Cost: 1 × $0.015 = $0.015
// CO₂: 1 × 3.2g = 3.2g
Savings: 90% cost and CO₂ reduction
Strategy 5: Parameter Optimization
Lower Temperature for Consistency
❌ High Temperature (multiple retries):
{
temperature: 1.0,
// Average attempts: 2.5
}
Cost: 2.5 × $0.0015 = $0.00375
✅ Low Temperature (single try):
{
temperature: 0.3,
// Average attempts: 1.0
}
Cost: 1.0 × $0.0015 = $0.0015
Savings: 60%
Smart Max Tokens
Find the optimal limit:
const tokenTests = [200, 300, 400, 500];
for (const limit of tokenTests) {
const responses = await testWithLimit(limit, 20);
if (responses.allComplete && responses.quality > 0.9) {
console.log(`Optimal limit: ${limit}`);
break; // Use lowest limit that works
}
}
Real-World Case Studies
Case Study 1: Customer Support Bot
Before Optimization:
Model: GPT-4.1 for all queries
Requests: 5,000/month
Avg tokens: 600
Cost: $36/month
CO₂: 1.68 kg/month
After Optimization:
Model Routing:
- Simple FAQ (60%): Claude 3 Haiku
- Standard (30%): GPT-4o Mini
- Complex (10%): GPT-4.1
Caching: 25% hit rate
Avg tokens: 350 (optimized prompts)
Cost: $4.80/month
CO₂: 0.9 kg/month
Savings: 87% cost, 46% CO₂
Case Study 2: Document Analysis
Before Optimization:
Model: GPT-5
Documents: 1,000/month
Avg tokens: 2,000
Cost: $40/month
CO₂: 27.6 kg/month
After Optimization:
Model: GPT-4.1 (A+ rating)
Structured prompts: 1,500 tokens avg
Batching: 10 docs per request
Cost: $1.80/month
CO₂: 0.84 kg/month
Savings: 96% cost, 97% CO₂
Cost Monitoring Dashboard
Track These Metrics
const dailyMetrics = {
date: "2025-01-05",
// Volume
totalRequests: 1250,
totalTokens: 562500,
// Cost
totalCost: 1.88,
avgCostPerRequest: 0.0015,
costBudget: 5.00,
costUtilization: "38%",
// Sustainability
totalCO2Grams: 360,
avgCO2PerRequest: 0.29,
sustainabilityRating: "A+",
// Model Distribution
modelBreakdown: {
"GPT-4o-Mini": { requests: 500, cost: 0.75, co2: 160 },
"Claude-3-Haiku": { requests: 600, cost: 0.48, co2: 192 },
"Amazon-Nova-Lite": { requests: 150, cost: 0.08, co2: 8 }
}
};
// Alerts
if (dailyMetrics.totalCost > dailyMetrics.costBudget * 0.8) {
alert("Approaching daily budget limit");
}
if (dailyMetrics.totalCO2Grams > 1000) {
alert("Consider more sustainable model choices");
}
Quick Wins
Immediate Cost Reductions
| Action | Time | Savings |
|---|---|---|
| Switch simple tasks to Claude 3 Haiku | 5 min | 50-80% |
| Add max token limits | 2 min | 30-50% |
| Shorten prompts | 15 min | 20-40% |
| Enable caching | 30 min | 20-40% |
| Implement model routing | 2 hours | 60-90% |
Total Potential Savings: 70-95%
Cost Calculator
Monthly Cost Estimator
Inputs:
- Requests per month: 10,000
- Avg tokens per request: 500
- Model: GPT-4o Mini
Calculation:
10,000 × 500 × $0.0015 / 1000 = $7.50/month
With model routing (50% Claude 3 Haiku):
- 5,000 × 500 × $0.0015 / 1000 = $3.75
- 5,000 × 500 × $0.0008 / 1000 = $2.00
Total: $5.75/month (23% savings)
ROI of Optimization
| Metric | Before | After | Savings |
|---|---|---|---|
| Monthly Cost | $100 | $8 | $92/month |
| Annual Cost | $1,200 | $96 | $1,104/year |
| Monthly CO₂ | 69 kg | 3 kg | 66 kg/month |
| Annual CO₂ | 828 kg | 36 kg | 792 kg/year |
Cost Optimization Checklist
Before Deployment
- Tested with cheapest viable model
- Optimized prompt length
- Set appropriate token limits
- Implemented caching strategy
- Defined model routing rules
- Established cost monitoring
- Set budget alerts
- Tracked sustainability metrics
Monthly Review
- Review model usage distribution
- Identify high-cost use cases
- Test cheaper alternatives
- Update routing rules
- Optimize cache hit rate
- Report on cost and CO₂ trends
Related Documentation
- Model Comparison - Full pricing and sustainability data
- Sustainability Guide - Environmental impact optimization
- Best Practices - Testing strategies
- Parameters Reference - Token optimization
Implement these strategies to reduce costs by 70-95% while improving sustainability.