Cost Optimization
Strategies to reduce AI costs by 50-90% while maintaining quality.
Understanding Costs
Pricing Structure
| Model | Input (per 1K tokens) | Output (per 1K tokens) | Total |
|---|---|---|---|
| GPT-4 Omni | $0.03 | $0.06 | $0.09 |
| GPT-4o Mini | $0.006 | $0.012 | $0.018 |
| GPT-3.5 Turbo | $0.002 | $0.004 | $0.006 |
Token Calculation
1 token ≈ 4 characters
1 token ≈ 0.75 words
Example:
"Hello, how are you?" = 5 tokens
Request: 150 tokens (input) + 300 tokens (output) = 450 total tokens
Strategy 1: Right-Size Your Model
The Problem
Using GPT-4 Omni for everything = 💰💰💰
10,000 requests/month × $0.09 per request = $900/month
The Solution
Model Routing Based on Complexity
function selectModel(request) {
complexity = analyzeComplexity(request);
if (complexity === "HIGH") {
return "GPT-4-Omni"; // 10% of requests
} else if (complexity === "MEDIUM") {
return "GPT-4o-Mini"; // 60% of requests
} else {
return "GPT-3.5-Turbo"; // 30% of requests
}
}
Savings Calculation
10,000 requests/month
Before (all GPT-4 Omni):
10,000 × $0.09 = $900/month
After (mixed):
- 1,000 × $0.09 (GPT-4 Omni) = $90
- 6,000 × $0.018 (GPT-4o Mini) = $108
- 3,000 × $0.006 (GPT-3.5) = $18
Total = $216/month
Savings: $684/month (76%)
Strategy 2: Optimize Token Usage
Reduce Input Tokens
❌ Verbose Prompts:
I would really appreciate it if you could help me by analyzing
this customer feedback and providing me with a detailed summary
of the main points that the customer is trying to communicate.
Here is the feedback: "{feedback}"
Tokens: ~50
✅ Concise Prompts:
Summarize this customer feedback:
"{feedback}"
Tokens: ~8
Savings: 84% fewer input tokens
Limit Output Tokens
❌ No Limit:
{
prompt: "Explain AI",
maxTokens: null // Could generate 2000+ tokens
}
Average: 800 tokens
✅ Set Appropriate Limit:
{
prompt: "Explain AI",
maxTokens: 300 // Enough for good explanation
}
Average: 300 tokens
Savings: 62% fewer output tokens
Remove Unnecessary Context
❌ Too Much Context:
Background: [500 tokens of company history]
Guidelines: [300 tokens of policies]
Examples: [400 tokens of examples]
Question: "What is our return policy?"
Input tokens: ~1250
✅ Essential Context Only:
Return Policy: 30 days, receipt required
Question: "What is our return policy?"
Input tokens: ~15
Savings: 99% reduction
Strategy 3: Caching & Reuse
Cache Common Responses
// Check cache first
cachedResponse = cache.get(promptHash);
if (cachedResponse) {
return cachedResponse; // $0 cost!
}
// Only call API if not cached
response = await callAPI(prompt);
cache.set(promptHash, response, ttl=3600);
Savings Example:
- 10,000 requests/month
- 30% are repeat questions
- 3,000 × $0.018 = $54 saved/month
Batch Processing
❌ Individual Requests:
for (item of items) {
await processItem(item); // 100 API calls
}
Cost: 100 × $0.018 = $1.80
✅ Batch Processing:
await processBatch(items); // 1 API call
Cost: 1 × $0.018 = $0.018
Savings: 99%
Strategy 4: Prompt Engineering
Be Specific
❌ Vague:
"Tell me about the customer"
Result: 500+ tokens explaining everything
✅ Specific:
"Customer status: [Active/Inactive]"
Result: 2-3 tokens
Savings: 99% fewer output tokens
Use Structured Output
❌ Free-form:
"Analyze this data and give me insights"
Result: 800 tokens of prose
✅ Structured:
"Analyze this data:
1. Key trend: [1 sentence]
2. Recommendation: [1 sentence]
3. Risk: [1 word]"
Result: 50 tokens
Savings: 94%
Provide Examples (Few-Shot)
❌ No Examples:
"Classify sentiment of: '{text}'"
Needs: High temperature, multiple retries
✅ With Examples:
"Classify sentiment:
'Love it!' → Positive
'It's okay' → Neutral
'Terrible' → Negative
'{text}' → "
Needs: Lower temperature, single try
Savings: Fewer retries = 50% cost reduction
Strategy 5: Parameter Optimization
Lower Temperature for Consistency
❌ High Temperature:
{
temperature: 1.0,
// Generates long, creative responses
// May need multiple tries to get good output
}
Average attempts: 2.5 Cost: 2.5 × $0.018 = $0.045
✅ Low Temperature:
{
temperature: 0.3,
// Generates consistent, focused responses
// Usually good on first try
}
Average attempts: 1.0 Cost: 1.0 × $0.018 = $0.018
Savings: 60%
Smart Max Tokens
Test to find optimal:
// Test different limits
tokenTests = [200, 400, 600, 800];
for (limit of tokenTests) {
responses = testWithLimit(limit);
if (responses.allComplete && responses.quality > 0.9) {
optimalLimit = limit;
break; // Use lowest limit that works
}
}
Strategy 6: Monitoring & Alerts
Track Costs in Real-Time
dailyCost = {
date: "2025-12-10",
totalRequests: 1250,
totalTokens: 562500,
totalCost: 10.13,
avgCostPerRequest: 0.008,
budget: 15.00,
percentUsed: 67.5%
}
if (dailyCost.totalCost > dailyCost.budget * 0.8) {
alert("Approaching daily budget limit");
}
Cost Attribution
Track costs by use case:
costs = {
customerSupport: $120, // 40%
dataAnalysis: $90, // 30%
contentGeneration: $60, // 20%
other: $30 // 10%
}
Optimize highest-cost categories first.
Real-World Examples
Example 1: Customer Support Bot
Before Optimization:
- Model: GPT-4 Omni for all
- Avg tokens: 600
- Requests: 5,000/month
- Cost: $450/month
After Optimization:
- Model: GPT-3.5 (80%), GPT-4o Mini (20%)
- Avg tokens: 350 (optimized prompts)
- Caching: 30% hit rate
- Requests: 3,500 to API (1,500 cached)
- Cost: $45/month
Savings: $405/month (90%)
Example 2: Data Classification
Before Optimization:
- Model: GPT-4 Omni
- No examples in prompt
- Retry logic: 3 attempts average
- Cost: 10,000 × $0.09 × 3 = $2,700/month
After Optimization:
- Model: GPT-3.5 Turbo
- Few-shot examples
- Temperature: 0.2
- Single attempt: 95% success
- Cost: 10,000 × $0.006 × 1.05 = $63/month
Savings: $2,637/month (98%)
Cost Optimization Checklist
Before Deployment
- Tested with cheapest model first (GPT-3.5)
- Optimized prompt length
- Set appropriate token limits
- Implemented caching strategy
- Defined model routing rules
- Established cost monitoring
- Set budget alerts
After Deployment
- Monitor daily costs
- Track cost per use case
- Analyze token usage patterns
- Identify optimization opportunities
- A/B test cheaper alternatives
- Review and adjust monthly
Quick Wins
Immediate Cost Reductions
-
Switch to GPT-3.5 for simple tasks
- Savings: 85-93%
- Time: 5 minutes
-
Add max token limits
- Savings: 30-50%
- Time: 2 minutes
-
Shorten prompts
- Savings: 20-40%
- Time: 15 minutes
-
Enable caching
- Savings: 20-40%
- Time: 30 minutes
-
Batch similar requests
- Savings: 50-90%
- Time: 1 hour
Total Potential Savings: 70-90%
Cost Calculator
Monthly Cost Estimator
Requests per month: _______
Avg tokens per request: _______
Model: [ ] GPT-4 Omni [ ] GPT-4o Mini [ ] GPT-3.5
Calculation:
Requests × Tokens × (Price per 1K tokens) / 1000 = Monthly Cost
Example:
10,000 × 450 × $0.018 / 1000 = $81/month
Related Documentation
- Model Comparison - Choose the right model
- Parameters Reference - Optimize settings
- Best Practices - Testing strategies
- Temperature Guide - Reduce retries
Implement these strategies to reduce costs by 50-90% while maintaining quality.