Chat (Gemini native)

Gemini 3 Flash API

Gemini 3 Flash is the production-default Gemini — same capabilities as Pro with lower latency and cost. 1M token context, multimodal in/out.

URL model: gemini-3-flash-preview1M tokens
Wholesale key required. This endpoint accepts wholesale keys only. Have access? Get your key on the wholesale dashboard. Not on wholesale yet? Email admin@skillboss.co with your use case — we'll get you provisioned.

Quickstart

This model uses the Gemini Developer API native format — the model id is in the URL path, not the request body. Compatible with the Gemini SDK (@google/generative-ai), Gemini CLI, and opencode by overriding the base URL.

bashcurl "https://api.skillboss.co/v1beta/models/gemini-3-flash-preview:generateContent" \
  -H "x-goog-api-key: $SKILLBOSS_WHOLESALE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"parts": [{"text": "Hello"}]}
    ]
  }'

Your first 200 response is the fastest way to confirm setup. From there, swap in your real prompt and tune the model-specific parameters listed below.

Authentication

Every request must include your wholesale key. The header name depends on the endpoint — match the SDK you're using:

bashx-goog-api-key: $SKILLBOSS_WHOLESALE_KEY

Gemini's native API uses x-goog-api-key (Gemini SDK + opencode default) or ?key= as a query parameter (Gemini CLI). Both work — pick whichever matches your client. Standard (non-wholesale) console keys are rejected at the gateway with 401.

Code examples

Python
pythonimport os
from google import genai

client = genai.Client(
    api_key=os.environ["SKILLBOSS_WHOLESALE_KEY"],
    http_options={"base_url": "https://api.skillboss.co"},
)

resp = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Hello",
)
print(resp.text)
JavaScript / TypeScript
typescriptimport { GoogleGenerativeAI } from "@google/generative-ai";

// Point the Gemini SDK at SkillBoss by overriding the base URL.
const genAI = new GoogleGenerativeAI(process.env.SKILLBOSS_WHOLESALE_KEY, {
  baseUrl: "https://api.skillboss.co",
});

const model = genAI.getGenerativeModel({ model: "gemini-3-flash-preview" });
const result = await model.generateContent("Hello");
console.log(result.response.text());

Parameters

Gemini Developer API parameters. The model id is in the URL path (not the body) — see the curl example above. Body fields documented here.

NameTypeRequiredDescription
contentsarrayrequiredConversation parts. Each entry: { role: "user"|"model", parts: [{ text: "..." } | { inlineData: { mimeType, data } } | { fileData: { mimeType, fileUri } }] }.
generationConfigobjectoptionalGeneration tuning. Fields: temperature (0-2), topP (0-1), topK (integer), maxOutputTokens, candidateCount (default 1), stopSequences (array), responseMimeType ("text/plain" | "application/json"), responseSchema (OpenAPI-subset schema for strict structured JSON output).
systemInstructionobjectoptionalSystem prompt as { parts: [{ text: "..." }] } (or just { text: "..." } — server accepts both).
toolsarrayoptionalGemini tool/function-calling declarations. Function schema: [{ functionDeclarations: [{ name, description, parameters: <JSON Schema> }] }]. Also supports googleSearch + codeExecution built-in tools.
toolConfigobjectoptional{ functionCallingConfig: { mode: "AUTO" | "ANY" | "NONE", allowedFunctionNames: [...] } } — control when/which functions can be called.
safetySettingsarrayoptionalPer-category harm thresholds (overrides defaults). [{ category: "HARM_CATEGORY_*", threshold: "BLOCK_NONE"|"BLOCK_LOW_AND_ABOVE"|… }].
cachedContentstringoptionalFull cached-content resource name (projects/<id>/locations/<region>/cachedContents/<id>) — reuses a pre-cached system prompt + long context for 75% discount on cached tokens. Server passes through as-is.

Endpoint

MethodPOST
URLhttps://api.skillboss.co/v1beta/models/gemini-3-flash-preview:generateContent
Auth headerx-goog-api-key: $SKILLBOSS_WHOLESALE_KEY
Content-Typeapplication/json
StreamingUse the :streamGenerateContent path (instead of :generateContent) for SSE streaming.

Errors

The API uses standard HTTP status codes:

200OKRequest succeeded.
400Bad RequestInvalid model, missing required field, or malformed JSON.
401UnauthorizedMissing or invalid wholesale key. Non-wholesale console keys are rejected here.
402Insufficient CreditsWholesale balance too low — top up on the wholesale dashboard.
429Rate LimitedToo many requests. Back off with exponential delay.
500Server ErrorTransient upstream issue. Safe to retry.
503Upstream UnavailableDiscount pool capacity issue (lite tier). Retry or fall back to standard tier.

Pricing

Wholesale pricing is your account-specific discount × vendor list price. Discount rate depends on your contract — see the live numbers on the wholesale dashboard. The dashboard shows your current cost per 1M tokens (or per image / per second) for every model.

No platform markup on standard token billing. Volume tiers + monthly caps are configurable per key.