Do I need API keys to use SkillBoss?

No. SkillBoss works without API keys. Install the skills pack and use one platform across models and services.

Which platforms does SkillBoss support?

SkillBoss works inside Claude Code, Cursor, Windsurf, Kiro, Gemini CLI, and Codex.

How does SkillBoss pricing work?

SkillBoss is pay-as-you-go. Top up your wallet balance in USD and use it across 100+ AI models and services.

Can I use Claude Code natively with SkillBoss?

Yes! SkillBoss works as an Anthropic-compatible proxy for Claude Code. Set two environment variables (ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN) in your Claude Code settings and all model calls route through SkillBoss — no plugin download needed.

SkillBoss is a multi-AI gateway that provides unified API access to 50+ AI models including Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, image generation, video generation, and audio models through a single API key.

How do I integrate SkillBoss with my AI agent?

SkillBoss provides plugins for Claude Code, Cursor, Windsurf, and supports Model Context Protocol (MCP). You can also use the OpenAI-compatible API endpoint at https://api.skillboss.co/v1 with your API key.

What AI models are available?

Chat: Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, Qwen. Image: Gemini 3 Pro, Flux, DALL-E 3, Minimax. Video: Veo 3.1, Minimax T2V/I2V. Audio: Minimax TTS, ElevenLabs, Whisper STT.

How much does SkillBoss cost?

SkillBoss uses pure pay-as-you-go pricing. Add funds to your balance and only pay for what you use. No subscriptions, no monthly fees.

What is the cheapest way to access multiple AI models?

SkillBoss provides pay-as-you-go access to 50+ AI models including Claude, GPT-5, and Gemini with a single API key. Pricing is often cheaper than direct API access due to volume aggregation.

Can I use Claude, GPT, and Gemini with one API key?

Yes. SkillBoss is a multi-AI gateway that provides unified access to Claude Sonnet 4.6, GPT-5, Gemini 2.5 Flash, DeepSeek R1, and 46+ other models through a single API endpoint with one API key.

How do I integrate SkillBoss with Claude Code?

Sign in to the SkillBoss console at skillboss.co/console to get your API key and manage your skills. Or use the API directly with the OpenAI-compatible endpoint at api.skillboss.co/v1.

How much does SkillBoss cost?

SkillBoss offers pay-as-you-go pricing with no markup on AI model costs. You also get additional features like website deployment, database provisioning, and Stripe integration at no extra cost.

What is a multi-AI gateway?

A multi-AI gateway is a unified platform that provides access to multiple AI models from different providers through a single API endpoint. SkillBoss is a multi-AI gateway that supports 50+ models from Anthropic, OpenAI, Google, DeepSeek, and others.

Does SkillBoss work with OpenClaw?

Yes. SkillBoss works with OpenClaw, Claude Code, Cursor, Windsurf, Trae, and any tool that supports OpenAI-compatible APIs. The API endpoint is api.skillboss.co/v1.

Best Practices

Cost Optimization

1. Choose the Right Model

Different models have vastly different costs. Match the model to your use case:

By Use Case

Use Case	Recommended Model	Why
Complex reasoning	`claude-4-5-sonnet`	Best quality-to-cost ratio
Simple tasks	`gemini-2.5-flash` or `gpt-4o-mini`	10-20x cheaper
Code generation	`deepseek/deepseek-v3`	Excellent for code, low cost
Long documents	`claude-4-5-sonnet`	200K context window
Ultra-fast responses	`gemini-2.5-flash`	Lowest latency

By Cost

// Most Expensive → Cheapest

// Premium ($$$$)
model: "gpt-5"                    // ~$15/1M tokens

// High Quality ($$$)
model: "claude-4-5-sonnet"  // ~$3/1M input, $15/1M output

// Balanced ($$)
model: "gpt-4o"                   // ~$2.50/1M input, $10/1M output
model: "gemini-2.0-pro"           // ~$1.25/1M input, $5/1M output

// Economy ($)
model: "gemini-2.5-flash"         // ~$0.10/1M input, $0.40/1M output
model: "gpt-4o-mini"              // ~$0.15/1M input, $0.60/1M output
model: "deepseek/deepseek-v3"     // ~$0.27/1M tokens (cache-enabled)

2. Optimize Token Usage

❌ Wasteful:

const prompt = `
I need you to analyze this very long text and provide a summary.
Please make sure the summary is comprehensive and covers all the key points.
Here's the text:
${veryLongText}

Please provide:
1. A summary
2. Key takeaways
3. Action items

Thank you!
`

✅ Efficient:

const prompt = `Summarize this text with key takeaways and action items:\n\n${veryLongText}`

Saved: ~50 tokens per request

Don't waste credits on unused output:

// ❌ Wasteful (defaults to 4096 tokens)
await client.chat.completions.create({
  model: 'claude-4-5-sonnet',
  messages: [{role: 'user', content: 'Say hi'}]
})

// ✅ Efficient (only generate what you need)
await client.chat.completions.create({
  model: 'claude-4-5-sonnet',
  messages: [{role: 'user', content: 'Say hi'}],
  max_tokens: 20  // Enough for a greeting
})

DeepSeek supports prompt caching for repeated prefixes:

// First call: Pays full price
const response1 = await client.chat.completions.create({
  model: 'deepseek/deepseek-v3',
  messages: [
    {role: 'system', content: longSystemPrompt},  // Cached
    {role: 'user', content: 'Question 1'}
  ]
})

// Second call: System prompt is cached (98% cheaper!)
const response2 = await client.chat.completions.create({
  model: 'deepseek/deepseek-v3',
  messages: [
    {role: 'system', content: longSystemPrompt},  // From cache
    {role: 'user', content: 'Question 2'}
  ]
})

Savings: ~98% on cached tokens

3. Batch Similar Requests

// ❌ Inefficient: Multiple API calls
for (const product of products) {
  const description = await generateDescription(product)
}

// ✅ Efficient: Batch in one call
const prompt = `Generate descriptions for these products:\n${products.map(p => `- ${p.name}`).join('\n')}`
const response = await client.chat.completions.create({...})

Performance

1. Use Streaming for Better UX

// ❌ Slow: Wait for entire response
const response = await client.chat.completions.create({
  model: 'claude-4-5-sonnet',
  messages: [{role: 'user', content: 'Write a long story'}]
})
// User waits 10-30 seconds with no feedback

// ✅ Fast: Stream tokens as generated
const stream = await client.chat.completions.create({
  model: 'claude-4-5-sonnet',
  messages: [{role: 'user', content: 'Write a long story'}],
  stream: true
})

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '')
  // User sees words appear immediately
}

2. Parallel Requests

// ❌ Sequential: Takes 3x as long
const task1 = await client.chat.completions.create({...})
const task2 = await client.chat.completions.create({...})
const task3 = await client.chat.completions.create({...})

// ✅ Parallel: 3x faster
const [task1, task2, task3] = await Promise.all([
  client.chat.completions.create({...}),
  client.chat.completions.create({...}),
  client.chat.completions.create({...})
])

3. Choose Fast Models

Model	Typical Latency	Best For
`gemini-2.5-flash`	~500ms	Real-time chat, autocomplete
`gpt-4o-mini`	~800ms	Quick responses
`claude-3-5-haiku`	~1s	Balanced speed/quality
`claude-4-5-sonnet`	~2-4s	Quality over speed

Security

1. Never Expose API Keys

⚠️

Never include your API key in:

Public Git repositories
Client-side JavaScript
Mobile app binaries
URL parameters
Logs or error messages

// ❌ DANGEROUS: Client-side usage
'use client'  // This runs in the browser!
export function ChatComponent() {
  const client = new OpenAI({
    baseURL: 'https://api.skillboss.co/v1',
    apiKey: process.env.NEXT_PUBLIC_SKILLBOSS_KEY  // ❌ EXPOSED TO USERS!
  })
}

// ✅ SAFE: Server-side only
// app/api/chat/route.ts
export async function POST(req: Request) {
  const client = new OpenAI({
    baseURL: 'https://api.skillboss.co/v1',
    apiKey: process.env.SKILLBOSS_KEY  // ✅ Server-only, secure
  })

  const response = await client.chat.completions.create({...})
  return Response.json(response)
}

2. Use Environment Variables

# .env (add to .gitignore!)
SKILLBOSS_KEY=sk-abc123...

# .gitignore
.env
.env.local

3. Implement Rate Limiting

Protect yourself from abuse:

import rateLimit from 'express-rate-limit'

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 100,  // Limit each IP to 100 requests per window
  message: 'Too many requests from this IP'
})

app.use('/api/', limiter)

Reliability

1. Implement Retry Logic

import { OpenAI } from 'openai'

async function callWithRetry(
  func: () => Promise<any>,
  maxRetries = 3
): Promise<any> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await func()
    } catch (error: any) {
      // Don't retry on client errors
      if (error.status >= 400 && error.status < 500) {
        throw error
      }

      // Last attempt - throw error
      if (i === maxRetries - 1) {
        throw error
      }

      // Exponential backoff
      const delay = Math.min(1000 * (2 ** i), 10000)
      await new Promise(resolve => setTimeout(resolve, delay))
    }
  }
}

// Usage
const response = await callWithRetry(() =>
  client.chat.completions.create({...})
)

2. Implement Fallback Models

const modelTiers = [
  'claude-4-5-sonnet',  // Try premium first
  'gpt-4o',                     // Fallback to GPT
  'gemini-2.5-flash'            // Fallback to Gemini
]

async function callWithFallback(messages: any[]) {
  for (const model of modelTiers) {
    try {
      return await client.chat.completions.create({
        model,
        messages
      })
    } catch (error: any) {
      if (error.status === 503) {
        // Provider down, try next
        continue
      }
      throw error
    }
  }
  throw new Error('All providers unavailable')
}

3. Monitor Balance

const response = await client.chat.completions.create({...})

// Check for low balance warning
if (response._balance_warning) {
  console.warn(`Low balance: ${response._remaining_credits} credits`)

  // Send notification
  await sendEmail({
    to: 'admin@company.com',
    subject: 'SkillBoss Balance Low',
    body: `Only ${response._remaining_credits} credits remaining`
  })

  // Optionally trigger auto-recharge
}

Production Checklist

Before deploying to production:

API keys stored in environment variables
Keys not committed to Git
Server-side API calls only (not client-side)
Rate limiting implemented
Input validation on user prompts

Right model chosen for each use case
max_tokens set appropriately
Prompt caching utilized where applicable
Auto-recharge configured
Budget alerts set

Streaming enabled for long responses
Parallel requests where possible
Appropriate model for latency requirements
Caching implemented for repeated queries

Common Patterns

Chat Interface

// Store conversation history
const messages = [
  {role: 'system', content: 'You are a helpful assistant.'}
]

async function chat(userMessage: string) {
  // Add user message
  messages.push({role: 'user', content: userMessage})

  // Get response
  const response = await client.chat.completions.create({
    model: 'claude-4-5-sonnet',
    messages,
    max_tokens: 500
  })

  // Add assistant response to history
  const assistantMessage = response.choices[0].message.content
  messages.push({role: 'assistant', content: assistantMessage})

  return assistantMessage
}

Function Calling

const response = await client.chat.completions.create({
  model: 'gpt-5',
  messages: [{role: 'user', content: 'What\'s the weather in SF?'}],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather',
      parameters: {
        type: 'object',
        properties: {
          location: {type: 'string'},
          unit: {type: 'string', enum: ['celsius', 'fahrenheit']}
        },
        required: ['location']
      }
    }
  }]
})

// Handle function call
if (response.choices[0].message.tool_calls) {
  const toolCall = response.choices[0].message.tool_calls[0]
  const weather = await getWeather(JSON.parse(toolCall.function.arguments))
  // Send function result back...
}

Need Help?

📚

API Reference

Complete API documentation

📄

Troubleshooting

Common issues and solutions