Documentation

Agent Cost Optimization

Reduce AI costs by 70%+ with multi-model routing, batch processing, response caching, and quality-based fallback strategies for agents and automation workflows.

Save 70%+ on AI Costs

Most agents waste money using expensive models for simple tasks. SkillBoss enables intelligent cost optimization.


Strategy 1: Multi-Model Routing

Route to cheapest model that meets quality requirements:

def cost_aware_request(prompt: str, min_quality: float):
    models = [
        {"name": "gemini/gemini-2.5-flash", "cost": 0.075, "quality": 0.85},
        {"name": "deepseek/deepseek-r1", "cost": 0.14, "quality": 0.90},
        {"name": "claude-4-5-sonnet", "cost": 15.00, "quality": 0.98}
    ]

    # Select cheapest model meeting quality threshold
    for model in models:
        if model["quality"] >= min_quality:
            return use_model(model["name"], prompt)

# Simple task: use Gemini Flash (200x cheaper than Claude)
result = cost_aware_request("Summarize this text", min_quality=0.80)

# Complex task: use Claude (worth the cost)
result = cost_aware_request("Write legal contract", min_quality=0.95)

Savings: 92% on average


Strategy 2: Batch Processing

Reduce API overhead by batching:

# ❌ Expensive: 1000 separate API calls
for item in items:
    process(item)  # $1.00 API overhead × 1000 = $1,000

# ✅ Cheap: 10 batch calls
for batch in chunks(items, 100):
    process_batch(batch)  # $1.00 × 10 = $10

# Savings: $990 (99%)

Strategy 3: Caching

Cache identical requests:

from functools import lru_cache

@lru_cache(maxsize=10000)
def cached_llm(prompt: str):
    return skillboss.chat(prompt)

# First call: $0.01
result1 = cached_llm("What is AI?")

# Subsequent calls: $0.00 (cached)
result2 = cached_llm("What is AI?")
result3 = cached_llm("What is AI?")

# Savings: $0.02 per cache hit

Strategy 4: Quality-Based Fallback

Try cheap first, upgrade if needed:

def smart_request(prompt: str):
    # Try Gemini Flash ($0.075/1M)
    result = skillboss.chat(prompt, model="gemini-flash")

    # Check quality
    if quality_score(result) < 0.85:
        # Retry with Claude ($15/1M)
        result = skillboss.chat(prompt, model="claude-4-5")

    return result

# 80% of requests succeed with cheap model
# Only 20% need expensive model
# Average cost: $3.075/1M vs $15/1M
# Savings: 79.5%

Strategy 5: Context Window Optimization

Use models with larger context windows to reduce API calls:

# ❌ Expensive: Multiple calls with small context
responses = []
for chunk in document_chunks:
    response = skillboss.chat(chunk, model="gpt-4o")  # 128K context
    responses.append(response)
# Cost: 10 calls × $0.15 = $1.50

# ✅ Cheap: Single call with large context
response = skillboss.chat(entire_document, model="gemini-2.5-flash")  # 1M context
# Cost: 1 call × $0.075 = $0.075
# Savings: $1.425 (95%)

Strategy 6: Precompute Common Tasks

Generate common responses once, reuse forever:

# Precompute 100 common responses
common_questions = load_faq()
precomputed = {}

for q in common_questions:
    precomputed[q] = skillboss.chat(q, model="claude-4-5")
# One-time cost: 100 × $0.01 = $1.00

# Serve from precomputed cache
def answer_question(question):
    if question in precomputed:
        return precomputed[question]  # $0.00
    else:
        return skillboss.chat(question)  # $0.01

# 90% of questions are common
# Savings: 90% × $0.01 = $0.009 per request


Real-World Optimization Examples

Example 1: Content Creator Agent

Before optimization:

  • Uses Claude 4.5 for all 50 posts/day
  • Cost: 50 × $0.30 = $15/day

After optimization:

  • 40 posts: Gemini Flash ($0.002 each) = $0.08
  • 10 posts: Claude 4.5 ($0.30 each) = $3.00
  • Total: $3.08/day
  • Savings: $11.92/day (79%)

Example 2: Research Agent

Before:

  • Claude 4.5 for all 100 documents/day
  • Cost: 100 × $0.75 = $75/day

After:

  • Batch process 100 docs with Gemini 2.5 Flash (1M context)
  • Cost: $3.75/day
  • Savings: $71.25/day (95%)

Cost Monitoring Dashboard

Track optimization impact:

analytics = skillboss.get_analytics(period="last_30_days")

print(f"""
Cost Optimization Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total spent: ${analytics['total_spent']}
Average cost/day: ${analytics['daily_average']}

Model Usage:
- Gemini Flash: {analytics['gemini_percent']}% (cheapest)
- DeepSeek: {analytics['deepseek_percent']}%
- Claude: {analytics['claude_percent']}% (most expensive)

Optimization Opportunities:
{analytics['recommendations']}

Potential monthly savings: ${analytics['potential_savings']}
""")

Next Steps

📄

Multi-Model Routing

Automatic model selection

📄

Agent Pricing

Full pricing breakdown

📄

Usage Tracking

Monitor your optimization

📄

Budget Management

Set spending limits

Agent Cost Optimization