Cost Optimization

Strategies to reduce AI costs by 50-95% while maintaining quality and reducing environmental impact.

Understanding Costs

Current Einstein Model Pricing

Model	Cost per 1k Tokens	CO₂ per 1k Tokens	Rating
GPT-5	$0.020	13.78g	D
GPT-5.1 (Beta)	$0.025	13.78g	D
Claude Sonnet 4.5	$0.018	1.20g	A
GPT-4.1	$0.012	0.56g	A+
GPT-4o	$0.010	1.17g	A
Gemini 2.5 Pro	$0.010	1.54g	A
O3 (Beta)	$0.006	0.99g	A
GPT-5 Mini	$0.005	7.75g	B
Amazon Nova Pro	$0.003	0.50g	A+
Gemini 2.5 Flash	$0.0025	0.56g	A+
GPT-4.1 Mini	$0.002	0.59g	A+
Gemini 2.0 Flash	$0.002	0.45g	A+
O4 Mini (Beta)	$0.002	5.13g	B
GPT-4o Mini	$0.0015	0.64g	A+
Claude Haiku 4.5	$0.001	0.78g	A+
Claude 3 Haiku	$0.0008	0.64g	A+
Gemini 2.5 Flash Lite	$0.0008	0.15g	A+
Gemini 2.0 Flash Lite	$0.0007	0.12g	A+
Amazon Nova Lite	$0.0005	0.10g	A+

The Cost-Sustainability Correlation

Key Insight: Cheaper models are almost always more sustainable.

Cost ↓ = CO₂ ↓ = Water ↓

Optimizing for cost typically also optimizes for sustainability!

Strategy 1: Right-Size Your Model

The Problem

Using GPT-5 for everything = 💰💰💰 + 🌍📈

10,000 requests/month × 500 tokens × $0.020/1k = $100/month 10,000 requests/month × 500 tokens × 13.78g/1k = 69 kg CO₂/month

The Solution: Smart Model Routing

function selectModel(request) {
  const complexity = analyzeComplexity(request);
  
  switch(complexity) {
    case 'CRITICAL':
      return 'GPT-4.1';           // 5% - Quality + Sustainability
    case 'COMPLEX':
      return 'GPT-4o-Mini';       // 15% - Balanced
    case 'STANDARD':
      return 'Claude-3-Haiku';    // 50% - Fast & cheap
    case 'SIMPLE':
      return 'Amazon-Nova-Lite';  // 30% - Lowest cost
  }
}

Savings Calculation

Scenario	Cost/Month	CO₂/Month
Before (all GPT-5)	$100	69 kg
After (smart routing)	$8	3.5 kg
Savings	92%	95%

Detailed breakdown (10,000 requests, 500 tokens avg):

Model	% Requests	Monthly Cost	Monthly CO₂
GPT-4.1 (5%)	500	$3.00	0.14 kg
GPT-4o Mini (15%)	1,500	$1.13	0.48 kg
Claude 3 Haiku (50%)	5,000	$2.00	1.60 kg
Amazon Nova Lite (30%)	3,000	$0.75	0.15 kg
Total	10,000	$6.88	2.37 kg

Strategy 2: Optimize Token Usage

Reduce Input Tokens

❌ Verbose Prompts (50 tokens):

I would really appreciate it if you could help me by analyzing 
this customer feedback and providing me with a detailed summary 
of the main points that the customer is trying to communicate. 
Here is the feedback: "{feedback}"

✅ Concise Prompts (8 tokens):

Summarize this customer feedback:
"{feedback}"

Savings: 84% fewer input tokens

Limit Output Tokens

❌ No Limit:

{
  prompt: "Explain AI",
  maxTokens: null  // Could generate 2000+ tokens
}

Average: 800 tokens → $0.0012 with GPT-4o Mini

✅ Set Appropriate Limit:

{
  prompt: "Explain AI in 2 sentences",
  maxTokens: 100
}

Average: 80 tokens → $0.00012 with GPT-4o Mini

Savings: 90% fewer output tokens

Use Structured Output

❌ Free-form (800 tokens):

"Analyze this data and give me insights"

✅ Structured (50 tokens):

"Analyze this data. Format:
- Trend: [one sentence]
- Action: [one sentence]"

Savings: 94% reduction

Strategy 3: Model Selection by Use Case

Customer Support

Approach	Model	Cost/Request	Quality
Premium	GPT-4.1	$0.006	98%
Balanced	GPT-4o Mini	$0.00075	94%
Budget	Claude 3 Haiku	$0.0004	90%

Recommendation: Start with Claude 3 Haiku, escalate to GPT-4o Mini for complex issues.

Data Classification

Approach	Model	Cost/10k items	Accuracy
Overkill	GPT-5	$100	99%
Optimal	GPT-4o Mini	$7.50	97%
Budget	Amazon Nova Lite	$2.50	94%

Recommendation: GPT-4o Mini for best accuracy/cost balance.

Content Generation

Approach	Model	Cost/Article	Quality
Premium	Claude Sonnet 4.5	$0.036	Excellent
Balanced	Gemini 2.5 Flash	$0.005	Good
Budget	GPT-4o Mini	$0.003	Good

Recommendation: Gemini 2.5 Flash for content at scale.

Strategy 4: Caching & Batching

Cache Common Responses

// Check cache first
const cachedResponse = cache.get(promptHash);
if (cachedResponse) {
  return cachedResponse;  // $0 cost, 0 CO₂!
}

// Only call API if not cached
const response = await callAPI(prompt);
cache.set(promptHash, response, ttl=3600);

Example Savings:

10,000 requests/month
30% cache hit rate
3,000 cached × $0.0015 saved = $4.50/month saved
3,000 cached × 0.64g saved = 1.9 kg CO₂ avoided

Batch Similar Requests

❌ Individual (100 API calls):

for (const item of items) {
  await processItem(item);
}
// Cost: 100 × $0.0015 = $0.15
// CO₂: 100 × 0.32g = 32g

✅ Batched (1 API call):

await processBatch(items);
// Cost: 1 × $0.015 = $0.015
// CO₂: 1 × 3.2g = 3.2g

Savings: 90% cost and CO₂ reduction

Strategy 5: Parameter Optimization

Lower Temperature for Consistency

❌ High Temperature (multiple retries):

{
  temperature: 1.0,
  // Average attempts: 2.5
}

Cost: 2.5 × $0.0015 = $0.00375

✅ Low Temperature (single try):

{
  temperature: 0.3,
  // Average attempts: 1.0
}

Cost: 1.0 × $0.0015 = $0.0015

Savings: 60%

Smart Max Tokens

Find the optimal limit:

const tokenTests = [200, 300, 400, 500];

for (const limit of tokenTests) {
  const responses = await testWithLimit(limit, 20);
  
  if (responses.allComplete && responses.quality > 0.9) {
    console.log(`Optimal limit: ${limit}`);
    break;  // Use lowest limit that works
  }
}

Real-World Case Studies

Case Study 1: Customer Support Bot

Before Optimization:

Model: GPT-4.1 for all queries
Requests: 5,000/month
Avg tokens: 600
Cost: $36/month
CO₂: 1.68 kg/month

After Optimization:

Model Routing:
- Simple FAQ (60%): Claude 3 Haiku
- Standard (30%): GPT-4o Mini
- Complex (10%): GPT-4.1

Caching: 25% hit rate
Avg tokens: 350 (optimized prompts)

Cost: $4.80/month
CO₂: 0.9 kg/month

Savings: 87% cost, 46% CO₂

Case Study 2: Document Analysis

Before Optimization:

Model: GPT-5
Documents: 1,000/month
Avg tokens: 2,000
Cost: $40/month
CO₂: 27.6 kg/month

After Optimization:

Model: GPT-4.1 (A+ rating)
Structured prompts: 1,500 tokens avg
Batching: 10 docs per request

Cost: $1.80/month
CO₂: 0.84 kg/month

Savings: 96% cost, 97% CO₂

Cost Monitoring Dashboard

Track These Metrics

const dailyMetrics = {
  date: "2025-01-05",
  
  // Volume
  totalRequests: 1250,
  totalTokens: 562500,
  
  // Cost
  totalCost: 1.88,
  avgCostPerRequest: 0.0015,
  costBudget: 5.00,
  costUtilization: "38%",
  
  // Sustainability
  totalCO2Grams: 360,
  avgCO2PerRequest: 0.29,
  sustainabilityRating: "A+",
  
  // Model Distribution
  modelBreakdown: {
    "GPT-4o-Mini": { requests: 500, cost: 0.75, co2: 160 },
    "Claude-3-Haiku": { requests: 600, cost: 0.48, co2: 192 },
    "Amazon-Nova-Lite": { requests: 150, cost: 0.08, co2: 8 }
  }
};

// Alerts
if (dailyMetrics.totalCost > dailyMetrics.costBudget * 0.8) {
  alert("Approaching daily budget limit");
}

if (dailyMetrics.totalCO2Grams > 1000) {
  alert("Consider more sustainable model choices");
}

Quick Wins

Immediate Cost Reductions

Action	Time	Savings
Switch simple tasks to Claude 3 Haiku	5 min	50-80%
Add max token limits	2 min	30-50%
Shorten prompts	15 min	20-40%
Enable caching	30 min	20-40%
Implement model routing	2 hours	60-90%

Total Potential Savings: 70-95%

Cost Calculator

Monthly Cost Estimator

Inputs:
- Requests per month: 10,000
- Avg tokens per request: 500
- Model: GPT-4o Mini

Calculation:
10,000 × 500 × $0.0015 / 1000 = $7.50/month

With model routing (50% Claude 3 Haiku):
- 5,000 × 500 × $0.0015 / 1000 = $3.75
- 5,000 × 500 × $0.0008 / 1000 = $2.00
Total: $5.75/month (23% savings)

ROI of Optimization

Metric	Before	After	Savings
Monthly Cost	$100	$8	$92/month
Annual Cost	$1,200	$96	$1,104/year
Monthly CO₂	69 kg	3 kg	66 kg/month
Annual CO₂	828 kg	36 kg	792 kg/year