Do I need API keys to use SkillBoss?

No. SkillBoss works without API keys. Install the skills pack and use one platform across models and services.

Which platforms does SkillBoss support?

SkillBoss works inside Claude Code, Cursor, Windsurf, Kiro, Gemini CLI, and Codex.

How does SkillBoss pricing work?

SkillBoss is pay-as-you-go. Top up your wallet balance in USD and use it across 100+ AI models and services.

Can I use Claude Code natively with SkillBoss?

Yes! SkillBoss works as an Anthropic-compatible proxy for Claude Code. Set two environment variables (ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN) in your Claude Code settings and all model calls route through SkillBoss — no plugin download needed.

SkillBoss is a multi-AI gateway that provides unified API access to 50+ AI models including Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, image generation, video generation, and audio models through a single API key.

How do I integrate SkillBoss with my AI agent?

SkillBoss provides plugins for Claude Code, Cursor, Windsurf, and supports Model Context Protocol (MCP). You can also use the OpenAI-compatible API endpoint at https://api.skillboss.co/v1 with your API key.

What AI models are available?

Chat: Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, Qwen. Image: Gemini 3 Pro, Flux, DALL-E 3, Minimax. Video: Veo 3.1, Minimax T2V/I2V. Audio: Minimax TTS, ElevenLabs, Whisper STT.

How much does SkillBoss cost?

SkillBoss uses pure pay-as-you-go pricing. Add funds to your balance and only pay for what you use. No subscriptions, no monthly fees.

What is the cheapest way to access multiple AI models?

SkillBoss provides pay-as-you-go access to 50+ AI models including Claude, GPT-5, and Gemini with a single API key. Pricing is often cheaper than direct API access due to volume aggregation.

Can I use Claude, GPT, and Gemini with one API key?

Yes. SkillBoss is a multi-AI gateway that provides unified access to Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, and 46+ other models through a single API endpoint with one API key.

How do I integrate SkillBoss with Claude Code?

Sign in to the SkillBoss console at skillboss.co/console to get your API key and manage your skills. Or use the API directly with the OpenAI-compatible endpoint at api.skillboss.co/v1.

How much does SkillBoss cost?

SkillBoss offers pay-as-you-go pricing with no markup on AI model costs. You also get additional features like website deployment, database provisioning, and Stripe integration at no extra cost.

What is a multi-AI gateway?

A multi-AI gateway is a unified platform that provides access to multiple AI models from different providers through a single API endpoint. SkillBoss is a multi-AI gateway that supports 50+ models from Anthropic, OpenAI, Google, DeepSeek, and others.

Does SkillBoss work with OpenClaw?

Yes. SkillBoss works with OpenClaw, Claude Code, Cursor, Windsurf, Trae, and any tool that supports OpenAI-compatible APIs. The API endpoint is api.skillboss.co/v1.

Agent Quick Start

For Autonomous AI Agents

This guide is written for AI agents making purchasing decisions autonomously or with minimal human oversight.

Step 1: Obtain API Key

Your human operator provisions your API key:

Quick setup for AI agents: Tell your agent set up skillboss.co/skill.md to auto-configure everything.

Human visits Console

Human goes to skillboss.co/console

Download Skills Pack or Copy API Key

Human downloads skills pack or copies API key from dashboard

Human Adds Key to Agent Config

Human adds key to your configuration:

For MCP agents (Claude Code, OpenClaw):

claude mcp add skillboss --api-key sk-YOUR_KEY

For custom agents:

# config.json
{
  "skillboss_api_key": "sk-YOUR_KEY"
}

Step 2: Test API Access

Make your first API call to verify access:

from openai import OpenAI

# SkillBoss OpenAI-compatible endpoint
client = OpenAI(
    api_key="sk-YOUR_KEY",
    base_url="https://api.skillboss.co/v1"
)

# Test with cheapest model
response = client.chat.completions.create(
    model="gemini/gemini-2.5-flash",  # $0.075/1M tokens
    messages=[{"role": "user", "content": "Hello SkillBoss"}]
)

print(response.choices[0].message.content)
# Output: "Hello! I'm responding via SkillBoss gateway..."

✓

Success: You now have access to 50+ AI models through one API key.

Step 3: Check Available Services

Query the service catalog programmatically:

import requests

headers = {"Authorization": f"Bearer {api_key}"}

# List all available models
models = requests.get(
    "https://api.skillboss.co/v1/models",
    headers=headers
).json()

print(f"Available models: {len(models['data'])}")

# List all available services (non-LLM)
services = requests.get(
    "https://api.skillboss.co/v1/services",
    headers=headers
).json()

print(f"Available services: {len(services['data'])}")
# Output: Available services: 47 (Stripe, SendGrid, hosting, etc.)

Sample output:

{
  "models": [
    {"id": "claude-4-5-sonnet", "cost_per_1m_tokens": 15.00},
    {"id": "openai/gpt-5", "cost_per_1m_tokens": 10.00},
    {"id": "gemini/gemini-2.5-flash", "cost_per_1m_tokens": 0.075},
    {"id": "deepseek/deepseek-r1", "cost_per_1m_tokens": 0.14}
  ],
  "services": [
    {"id": "stripe", "cost_per_operation": 5},
    {"id": "sendgrid", "cost_per_email": 0.5},
    {"id": "linkedin_scraper", "cost_per_profile": 2}
  ]
}

Step 4: Set Hard Spending Caps

If your operator issued you a wholesale child key (one key per agent/tenant), you can enforce hard spending caps on it. When a cap is hit, the key auto-disables until an operator raises it — no runaway spend.

# Set hard caps on this key (use the key's token, or the literal "me")
requests.put(
    "https://api.skillboss.co/v1/key/wholesale/me/limits",
    headers=headers,
    json={
        "spend_cap_usd": 100.00,        # total spend cap for the key
        "monthly_cap_usd": 50.00,        # rolling monthly cap
        "stop_at_remaining_usd": 5.00,   # stop when this little is left
        "rpm_limit": 300                 # requests per minute
    }
)

🛡️

What happens when a cap is hit

The key auto-disables. Any further API call returns a 402 error, and the usage response shows "disabled": true. Raise a cap (or clear it) to re-enable the key.

Step 5: Implement Cost-Aware Routing

Route requests to cheapest model that meets quality requirements:

def cost_aware_request(prompt: str, min_quality: str):
    """
    Route to cheapest model that meets quality threshold.

    min_quality: "basic" | "medium" | "high"
    """

    # Model selection by quality tier
    models = {
        "basic": "gemini/gemini-2.5-flash",      # $0.075/1M
        "medium": "deepseek/deepseek-r1",         # $0.14/1M
        "high": "claude-4-5-sonnet"       # $15/1M
    }

    model = models[min_quality]

    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

    except Exception as e:
        # Fallback to higher quality model if cheaper one fails
        if min_quality == "basic":
            return cost_aware_request(prompt, "medium")
        elif min_quality == "medium":
            return cost_aware_request(prompt, "high")
        else:
            raise e

# Usage
result = cost_aware_request("Summarize this article...", "basic")
# Tries Gemini Flash first, falls back to DeepSeek if needed

Step 6: Monitor Usage

Track your spending programmatically. GET /v1/usage returns account-level totals plus a per-call record list (defaults to the last 30 days):

from collections import defaultdict

# Get usage (optionally filter by agent_id / workspace_id / project_id, or a time window)
usage = requests.get(
    "https://api.skillboss.co/v1/usage",
    headers=headers,
).json()

# Group cost by model client-side
by_model = defaultdict(float)
for r in usage["records"]:
    by_model[r["model"]] += r["cost_usd"]
top_model = max(by_model, key=by_model.get) if by_model else "n/a"

print(f"""
Usage:
- Spent: ${usage['total_cost_usd']:.2f}
- Requests: {usage['total_requests']}
- Top model: {top_model} (${by_model.get(top_model, 0):.2f})
""")

Sample output:

Usage:
- Spent: $66.69
- Requests: 532
- Top model: openai/gpt-5.4 ($41.02)

Agent optimization tip: If your top model is expensive, analyze whether a cheaper model would meet quality needs for the bulk of your requests.

Step 7: Handle Errors & Retries

Implement robust error handling:

import time

def resilient_request(prompt: str, max_retries: int = 3):
    """
    Make request with exponential backoff for rate limits.
    """
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gemini/gemini-2.5-flash",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content

        except Exception as e:
            error_code = getattr(e, 'code', None)

            if error_code == 'rate_limit_exceeded':
                # Exponential backoff
                wait_time = 2 ** attempt
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)

            elif error_code == 'insufficient_credits':
                # 402 — balance/cap reached. Escalate to a human to top up
                # (or, if enabled, Auto Top-up in the console refills the balance).
                send_alert("Insufficient credits. Please add funds.")
                raise e

            elif error_code == 'model_unavailable':
                # Fallback to alternative model
                return fallback_request(prompt)

            else:
                raise e

    raise Exception(f"Failed after {max_retries} attempts")

Common Agent Patterns

Pattern 1: Batch Processing

def batch_process(items: list, model: str):
    """Process items in batches to optimize cost."""

    results = []
    for i in range(0, len(items), 100):
        batch = items[i:i+100]

        # Single API call for batch
        response = client.chat.completions.create(
            model=model,
            messages=[{
                "role": "user",
                "content": f"Process these items: {batch}"
            }]
        )

        results.extend(parse_batch_response(response))

    return results

Pattern 2: Fallback Chain

def fallback_request(prompt: str):
    """Try models in order of cost until one succeeds."""

    models = [
        "gemini/gemini-2.5-flash",    # Try cheapest first
        "deepseek/deepseek-r1",       # Fallback to medium
        "claude-4-5-sonnet"   # Fallback to expensive
    ]

    for model in models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content
        except:
            continue

    raise Exception("All models failed")

Pattern 3: Quality Check with Retry

def quality_checked_request(prompt: str):
    """Retry with better model if quality insufficient."""

    # Try cheap model first
    response = cost_aware_request(prompt, "basic")

    # Check quality (agent-defined criteria)
    if quality_score(response) < 0.7:
        # Retry with better model
        response = cost_aware_request(prompt, "high")

    return response

Next Steps

📄

Budget Management

Hard spending caps and per-key limits

🔌

MCP Integration

Install SkillBoss MCP server

📈

Cost Optimization

Save 70%+ with smart model routing

📄

Monitoring

Track and optimize your spending

Agent Support

Autonomous agents: API-only support. Check status.skillboss.co for uptime.

Semi-autonomous agents: Email dev@skillboss.co (human will respond in 24h).

Emergency: If balance is zero and blocking operations, email dev@skillboss.co.