Do I need API keys to use SkillBoss?

No. SkillBoss works without API keys. Install the skills pack and use one platform across models and services.

Which platforms does SkillBoss support?

SkillBoss works inside Claude Code, Cursor, Windsurf, Kiro, Gemini CLI, and Codex.

How does SkillBoss pricing work?

SkillBoss is pay-as-you-go. Top up your wallet balance in USD and use it across 100+ AI models and services.

Can I use Claude Code natively with SkillBoss?

Yes! SkillBoss works as an Anthropic-compatible proxy for Claude Code. Set two environment variables (ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN) in your Claude Code settings and all model calls route through SkillBoss — no plugin download needed.

SkillBoss is a multi-AI gateway that provides unified API access to 50+ AI models including Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, image generation, video generation, and audio models through a single API key.

How do I integrate SkillBoss with my AI agent?

SkillBoss provides plugins for Claude Code, Cursor, Windsurf, and supports Model Context Protocol (MCP). You can also use the OpenAI-compatible API endpoint at https://api.skillboss.co/v1 with your API key.

What AI models are available?

Chat: Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, Qwen. Image: Gemini 3 Pro, Flux, DALL-E 3, Minimax. Video: Veo 3.1, Minimax T2V/I2V. Audio: Minimax TTS, ElevenLabs, Whisper STT.

How much does SkillBoss cost?

SkillBoss uses pure pay-as-you-go pricing. Add funds to your balance and only pay for what you use. No subscriptions, no monthly fees.

What is the cheapest way to access multiple AI models?

SkillBoss provides pay-as-you-go access to 50+ AI models including Claude, GPT-5, and Gemini with a single API key. Pricing is often cheaper than direct API access due to volume aggregation.

Can I use Claude, GPT, and Gemini with one API key?

Yes. SkillBoss is a multi-AI gateway that provides unified access to Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, and 46+ other models through a single API endpoint with one API key.

How do I integrate SkillBoss with Claude Code?

Sign in to the SkillBoss console at skillboss.co/console to get your API key and manage your skills. Or use the API directly with the OpenAI-compatible endpoint at api.skillboss.co/v1.

How much does SkillBoss cost?

SkillBoss offers pay-as-you-go pricing with no markup on AI model costs. You also get additional features like website deployment, database provisioning, and Stripe integration at no extra cost.

What is a multi-AI gateway?

A multi-AI gateway is a unified platform that provides access to multiple AI models from different providers through a single API endpoint. SkillBoss is a multi-AI gateway that supports 50+ models from Anthropic, OpenAI, Google, DeepSeek, and others.

Does SkillBoss work with OpenClaw?

Yes. SkillBoss works with OpenClaw, Claude Code, Cursor, Windsurf, Trae, and any tool that supports OpenAI-compatible APIs. The API endpoint is api.skillboss.co/v1.

Agent Budget Management

Why Agents Need Budget Management

Autonomous agents make hundreds of API calls per day. Without budget controls:

Costs spiral out of control
Unexpected bills surprise humans
Agents can't optimize spending
No protection against runaway loops

SkillBoss gives agents financial autonomy with guardrails.

Setting Hard Spending Caps

SkillBoss enforces hard caps per key. Issue one wholesale child key per agent or tenant, then cap it with PUT /v1/key/wholesale/{token}/limits. Use the key's token, or the literal me to cap the calling key. All fields are optional/nullable.

import requests

headers = {"Authorization": f"Bearer {API_KEY}"}

# Configure hard caps on this key
requests.put(
    "https://api.skillboss.co/v1/key/wholesale/me/limits",
    headers=headers,
    json={
        "spend_cap_usd": 100.00,        # total spend cap for the key
        "monthly_cap_usd": 50.00,        # rolling monthly cap
        "stop_at_remaining_usd": 5.00,   # stop when this little is left
        "rpm_limit": 300                 # requests per minute
    }
)

When a cap is hit:

The key auto-disables — further API calls return 402 Payment Required.
The usage response shows "disabled": true.
The key stays disabled until an operator raises the cap (or clears it).

🛡️

Hard caps, not soft warnings

These are hard limits, enforced server-side on every request. There are no webhook "budget warning" callbacks and no automatic balance top-up from these caps — a capped key simply stops spending. Poll the usage endpoint (below) to watch how close a key is to its cap.

Cost Tracking

Per-Key Usage

For a wholesale child key, GET /v1/key/wholesale/{token}/usage returns totals, caps, and a per-model breakdown for a time window (from/to, ISO-8601 UTC):

# Per-key usage this month
usage = requests.get(
    "https://api.skillboss.co/v1/key/wholesale/me/usage",
    headers=headers,
    params={"from": "2026-06-01T00:00:00Z", "to": "2026-07-01T00:00:00Z"}
).json()["data"]

print(f"""
Key: {usage['label']}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Spent (period):  ${usage['totals']['total_usd']:.2f}
Monthly spent:   ${usage['monthly_spent_usd']:.2f}
Monthly cap:     ${usage['monthly_cap_usd']:.2f}
Calls:           {usage['totals']['total_calls']}
Disabled:        {usage['disabled']}

By model:
""")

for m in usage["by_model"]:
    print(f"  {m['model']:<24} {m['calls']:>6} calls  ${m['usd']:.4f}")

Sample output:

Key: tenant-1021
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Spent (period):  $15.93
Monthly spent:   $15.93
Monthly cap:     $50.00
Calls:           477
Disabled:        False

By model:
  gpt-5.5                     101 calls  $1.1679
  claude-opus-4-8               8 calls  $0.0059

Need a row per call? Stream a CSV with GET /v1/key/wholesale/{token}/usage.csv?from=...&to=....

Account-Level Usage

For account-wide totals across all keys, use GET /v1/usage. It returns a per-call record list you can filter (agent_id, workspace_id, project_id, start/end in Unix seconds) and group client-side. See the Usage Tracking reference for the full response shape.

Cost Optimization Strategies

1. Model Selection Optimization

class CostOptimizer:
    """Automatically route to cheapest model that meets quality needs."""

    def __init__(self, quality_threshold: float = 0.8):
        self.quality_threshold = quality_threshold

    def select_model(self, task_complexity: str):
        """Choose model based on task complexity."""

        models = {
            "simple": {
                "model": "gemini/gemini-2.5-flash",
                "cost_per_1m": 0.075,
                "expected_quality": 0.85
            },
            "medium": {
                "model": "deepseek/deepseek-r1",
                "cost_per_1m": 0.14,
                "expected_quality": 0.90
            },
            "complex": {
                "model": "claude-4-5-sonnet",
                "cost_per_1m": 15.00,
                "expected_quality": 0.98
            }
        }

        return models[task_complexity]["model"]

    def fallback_if_needed(self, result, current_model):
        """Upgrade to better model if quality insufficient."""

        if self.evaluate_quality(result) < self.quality_threshold:
            # Try next tier up
            if "gemini" in current_model:
                return "deepseek/deepseek-r1"
            elif "deepseek" in current_model:
                return "claude-4-5-sonnet"

        return current_model  # Quality acceptable

2. Batch Processing

Reduce API calls by batching:

# Instead of 100 separate API calls
for item in items:
    result = process_single(item)  # 100 API calls

# Batch into 10 calls of 10 items each
for batch in chunks(items, size=10):
    results = process_batch(batch)  # 10 API calls

# Cost savings: 90% reduction in API overhead

3. Caching

Cache responses for repeated queries:

from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_llm_call(prompt: str, model: str):
    """Cache LLM responses for identical prompts."""

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

    return response.choices[0].message.content

# Identical prompts hit cache instead of API
result1 = cached_llm_call("What is AI?", "gemini/gemini-2.5-flash")
result2 = cached_llm_call("What is AI?", "gemini/gemini-2.5-flash")  # Cached, $0 cost

Next Steps

📈

Cost Optimization

Advanced strategies to reduce costs by 70%+

📄

Usage Tracking

Monitor and analyze your spending

📄

Multi-Model Routing

Automatically route to cheapest model

📄

Quick Start

Get started with SkillBoss