Skip to main content

Cost Optimization

Strategies to reduce AI costs by 50-95% while maintaining quality and reducing environmental impact.


Understanding Costs

Current Einstein Model Pricing

ModelCost per 1k TokensCO₂ per 1k TokensRating
GPT-5$0.02013.78gD
GPT-5.1 (Beta)$0.02513.78gD
Claude Sonnet 4.5$0.0181.20gA
GPT-4.1$0.0120.56gA+
GPT-4o$0.0101.17gA
Gemini 2.5 Pro$0.0101.54gA
O3 (Beta)$0.0060.99gA
GPT-5 Mini$0.0057.75gB
Amazon Nova Pro$0.0030.50gA+
Gemini 2.5 Flash$0.00250.56gA+
GPT-4.1 Mini$0.0020.59gA+
Gemini 2.0 Flash$0.0020.45gA+
O4 Mini (Beta)$0.0025.13gB
GPT-4o Mini$0.00150.64gA+
Claude Haiku 4.5$0.0010.78gA+
Claude 3 Haiku$0.00080.64gA+
Gemini 2.5 Flash Lite$0.00080.15gA+
Gemini 2.0 Flash Lite$0.00070.12gA+
Amazon Nova Lite$0.00050.10gA+

The Cost-Sustainability Correlation

Key Insight: Cheaper models are almost always more sustainable.

Cost ↓ = CO₂ ↓ = Water ↓

Optimizing for cost typically also optimizes for sustainability!


Strategy 1: Right-Size Your Model

The Problem

Using GPT-5 for everything = 💰💰💰 + 🌍📈

10,000 requests/month × 500 tokens × $0.020/1k = $100/month 10,000 requests/month × 500 tokens × 13.78g/1k = 69 kg CO₂/month

The Solution: Smart Model Routing

function selectModel(request) {
const complexity = analyzeComplexity(request);

switch(complexity) {
case 'CRITICAL':
return 'GPT-4.1'; // 5% - Quality + Sustainability
case 'COMPLEX':
return 'GPT-4o-Mini'; // 15% - Balanced
case 'STANDARD':
return 'Claude-3-Haiku'; // 50% - Fast & cheap
case 'SIMPLE':
return 'Amazon-Nova-Lite'; // 30% - Lowest cost
}
}

Savings Calculation

ScenarioCost/MonthCO₂/Month
Before (all GPT-5)$10069 kg
After (smart routing)$83.5 kg
Savings92%95%

Detailed breakdown (10,000 requests, 500 tokens avg):

Model% RequestsMonthly CostMonthly CO₂
GPT-4.1 (5%)500$3.000.14 kg
GPT-4o Mini (15%)1,500$1.130.48 kg
Claude 3 Haiku (50%)5,000$2.001.60 kg
Amazon Nova Lite (30%)3,000$0.750.15 kg
Total10,000$6.882.37 kg

Strategy 2: Optimize Token Usage

Reduce Input Tokens

❌ Verbose Prompts (50 tokens):

I would really appreciate it if you could help me by analyzing 
this customer feedback and providing me with a detailed summary
of the main points that the customer is trying to communicate.
Here is the feedback: "{feedback}"

✅ Concise Prompts (8 tokens):

Summarize this customer feedback:
"{feedback}"

Savings: 84% fewer input tokens

Limit Output Tokens

❌ No Limit:

{
prompt: "Explain AI",
maxTokens: null // Could generate 2000+ tokens
}

Average: 800 tokens → $0.0012 with GPT-4o Mini

✅ Set Appropriate Limit:

{
prompt: "Explain AI in 2 sentences",
maxTokens: 100
}

Average: 80 tokens → $0.00012 with GPT-4o Mini

Savings: 90% fewer output tokens

Use Structured Output

❌ Free-form (800 tokens):

"Analyze this data and give me insights"

✅ Structured (50 tokens):

"Analyze this data. Format:
- Trend: [one sentence]
- Action: [one sentence]"

Savings: 94% reduction


Strategy 3: Model Selection by Use Case

Customer Support

ApproachModelCost/RequestQuality
PremiumGPT-4.1$0.00698%
BalancedGPT-4o Mini$0.0007594%
BudgetClaude 3 Haiku$0.000490%

Recommendation: Start with Claude 3 Haiku, escalate to GPT-4o Mini for complex issues.

Data Classification

ApproachModelCost/10k itemsAccuracy
OverkillGPT-5$10099%
OptimalGPT-4o Mini$7.5097%
BudgetAmazon Nova Lite$2.5094%

Recommendation: GPT-4o Mini for best accuracy/cost balance.

Content Generation

ApproachModelCost/ArticleQuality
PremiumClaude Sonnet 4.5$0.036Excellent
BalancedGemini 2.5 Flash$0.005Good
BudgetGPT-4o Mini$0.003Good

Recommendation: Gemini 2.5 Flash for content at scale.


Strategy 4: Caching & Batching

Cache Common Responses

// Check cache first
const cachedResponse = cache.get(promptHash);
if (cachedResponse) {
return cachedResponse; // $0 cost, 0 CO₂!
}

// Only call API if not cached
const response = await callAPI(prompt);
cache.set(promptHash, response, ttl=3600);

Example Savings:

  • 10,000 requests/month
  • 30% cache hit rate
  • 3,000 cached × $0.0015 saved = $4.50/month saved
  • 3,000 cached × 0.64g saved = 1.9 kg CO₂ avoided

Batch Similar Requests

❌ Individual (100 API calls):

for (const item of items) {
await processItem(item);
}
// Cost: 100 × $0.0015 = $0.15
// CO₂: 100 × 0.32g = 32g

✅ Batched (1 API call):

await processBatch(items);
// Cost: 1 × $0.015 = $0.015
// CO₂: 1 × 3.2g = 3.2g

Savings: 90% cost and CO₂ reduction


Strategy 5: Parameter Optimization

Lower Temperature for Consistency

❌ High Temperature (multiple retries):

{
temperature: 1.0,
// Average attempts: 2.5
}

Cost: 2.5 × $0.0015 = $0.00375

✅ Low Temperature (single try):

{
temperature: 0.3,
// Average attempts: 1.0
}

Cost: 1.0 × $0.0015 = $0.0015

Savings: 60%

Smart Max Tokens

Find the optimal limit:

const tokenTests = [200, 300, 400, 500];

for (const limit of tokenTests) {
const responses = await testWithLimit(limit, 20);

if (responses.allComplete && responses.quality > 0.9) {
console.log(`Optimal limit: ${limit}`);
break; // Use lowest limit that works
}
}

Real-World Case Studies

Case Study 1: Customer Support Bot

Before Optimization:

Model: GPT-4.1 for all queries
Requests: 5,000/month
Avg tokens: 600
Cost: $36/month
CO₂: 1.68 kg/month

After Optimization:

Model Routing:
- Simple FAQ (60%): Claude 3 Haiku
- Standard (30%): GPT-4o Mini
- Complex (10%): GPT-4.1

Caching: 25% hit rate
Avg tokens: 350 (optimized prompts)

Cost: $4.80/month
CO₂: 0.9 kg/month

Savings: 87% cost, 46% CO₂

Case Study 2: Document Analysis

Before Optimization:

Model: GPT-5
Documents: 1,000/month
Avg tokens: 2,000
Cost: $40/month
CO₂: 27.6 kg/month

After Optimization:

Model: GPT-4.1 (A+ rating)
Structured prompts: 1,500 tokens avg
Batching: 10 docs per request

Cost: $1.80/month
CO₂: 0.84 kg/month

Savings: 96% cost, 97% CO₂


Cost Monitoring Dashboard

Track These Metrics

const dailyMetrics = {
date: "2025-01-05",

// Volume
totalRequests: 1250,
totalTokens: 562500,

// Cost
totalCost: 1.88,
avgCostPerRequest: 0.0015,
costBudget: 5.00,
costUtilization: "38%",

// Sustainability
totalCO2Grams: 360,
avgCO2PerRequest: 0.29,
sustainabilityRating: "A+",

// Model Distribution
modelBreakdown: {
"GPT-4o-Mini": { requests: 500, cost: 0.75, co2: 160 },
"Claude-3-Haiku": { requests: 600, cost: 0.48, co2: 192 },
"Amazon-Nova-Lite": { requests: 150, cost: 0.08, co2: 8 }
}
};

// Alerts
if (dailyMetrics.totalCost > dailyMetrics.costBudget * 0.8) {
alert("Approaching daily budget limit");
}

if (dailyMetrics.totalCO2Grams > 1000) {
alert("Consider more sustainable model choices");
}

Quick Wins

Immediate Cost Reductions

ActionTimeSavings
Switch simple tasks to Claude 3 Haiku5 min50-80%
Add max token limits2 min30-50%
Shorten prompts15 min20-40%
Enable caching30 min20-40%
Implement model routing2 hours60-90%

Total Potential Savings: 70-95%


Cost Calculator

Monthly Cost Estimator

Inputs:
- Requests per month: 10,000
- Avg tokens per request: 500
- Model: GPT-4o Mini

Calculation:
10,000 × 500 × $0.0015 / 1000 = $7.50/month

With model routing (50% Claude 3 Haiku):
- 5,000 × 500 × $0.0015 / 1000 = $3.75
- 5,000 × 500 × $0.0008 / 1000 = $2.00
Total: $5.75/month (23% savings)

ROI of Optimization

MetricBeforeAfterSavings
Monthly Cost$100$8$92/month
Annual Cost$1,200$96$1,104/year
Monthly CO₂69 kg3 kg66 kg/month
Annual CO₂828 kg36 kg792 kg/year

Cost Optimization Checklist

Before Deployment

  • Tested with cheapest viable model
  • Optimized prompt length
  • Set appropriate token limits
  • Implemented caching strategy
  • Defined model routing rules
  • Established cost monitoring
  • Set budget alerts
  • Tracked sustainability metrics

Monthly Review

  • Review model usage distribution
  • Identify high-cost use cases
  • Test cheaper alternatives
  • Update routing rules
  • Optimize cache hit rate
  • Report on cost and CO₂ trends


Implement these strategies to reduce costs by 70-95% while improving sustainability.