Documentation

Ecommerce: Enrich 10,000 Product Rows With Scraped Vendor Data

Agent pattern for bulk product feed enrichment: scrape each vendor page, rewrite with Claude 4.5 Haiku, cap daily spend via wallet rules.

Pattern type: Bulk CSV enrichment pipeline Tools used: firecrawl/scrape, claude-4-5-haiku, /api/wallet/rules, /api/batch Estimated cost per 1,000 products: ~$29

What this pattern does

A team running a product catalog takes a CSV of 10,000 SKUs, each with a vendor URL and thin metadata. The agent scrapes each vendor product page, then uses Claude 4.5 Haiku (the cheap, high-throughput tier) to rewrite the description, extract structured attributes (color, material, weight), and tag the category. Output is a new CSV ready for upload to Shopify, BigCommerce, or a PIM.

Haiku is intentional here: for "rewrite this scraped page as a clean product description", Opus is overkill and 5x more expensive.

Workflow

1. Load products.csv (10,000 rows)
2. Chunk into batches of 200 rows
3. /api/batch  -> firecrawl/scrape for each vendor_url
4. For each scraped page:
     /v1/run model=claude-4-5-haiku
       -> { description, color, material, weight, category }
5. Append to enriched.csv
6. /api/wallet/rules caps agent at $50/day, hard-stops if exceeded

Runnable implementation

import fs from "fs";
import { parse } from "csv-parse/sync";
import { stringify } from "csv-stringify/sync";

const SB = "https://api.heybossai.com/v1";
const KEY = process.env.SKILLBOSS_API_KEY!;

const rows = parse(fs.readFileSync("products.csv"), { columns: true });
const out: any[] = [];

for (const chunk of chunks(rows, 200)) {
  const scraped = await fetch("https://www.skillboss.co/api/batch", {
    method: "POST",
    headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({
      requests: chunk.map((r: any) => ({
        model: "firecrawl/scrape",
        inputs: { url: r.vendor_url },
      })),
      idempotency_key: `feed-${chunk[0].sku}`,
    }),
  }).then((r) => r.json());

  for (let i = 0; i < chunk.length; i++) {
    const md = scraped.results[i]?.markdown?.slice(0, 8000);
    if (!md) continue;

    const enriched = await fetch(`${SB}/run`, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${KEY}`,
        "Content-Type": "application/json",
        "X-Max-Cost-Usd": "0.01",
      },
      body: JSON.stringify({
        model: "claude-4-5-haiku",
        messages: [
          {
            role: "system",
            content:
              "Return strict JSON: { description, color, material, weight, category }",
          },
          { role: "user", content: md },
        ],
      }),
    }).then((r) => r.json());

    out.push({ ...chunk[i], ...JSON.parse(enriched.choices[0].message.content) });
  }
}

fs.writeFileSync("enriched.csv", stringify(out, { header: true }));

function* chunks<T>(arr: T[], n: number) {
  for (let i = 0; i < arr.length; i += n) yield arr.slice(i, i + n);
}

Cost breakdown (per 1,000 products)

Assumes ~6k input tokens + 300 output tokens per product (scraped page is long, output is structured JSON).

CallSkillUnit costUnitsSubtotal
Scrape vendor pagefirecrawl/scrape$0.0253/request1,000$25.30
Enrich (input)claude-4-5-haiku$4/1M tokens6M in$24.00
Enrich (output)claude-4-5-haiku$20/1M tokens0.3M out$6.00

Total per 1,000 products: ~$55 Total for full 10,000-row feed: ~$550 one-shot If you skip the scrape step (you already have vendor HTML): ~$30 per 1,000.

Guardrails for this pattern

  • Per-call cap: X-Max-Cost-Usd: 0.01 keeps a single weird page from eating $0.50.
  • Daily limit: set max_daily_spend_usd: 50 via /api/wallet/rules so an infinite loop cannot drain the wallet overnight.
  • Sub-wallet: put the enrichment agent on its own wallet via /api/wallet/sub-wallets; if something breaks you know exactly who spent what.
  • Idempotency: use idempotency_key: "feed-<sku>" so retries after a crash do not double-charge.
  • Audit: every enrichment returns a signed receipt you can attach to the row.

Going further

  • Add a second pass with claude-4-6-sonnet only for rows where Haiku returned low-confidence categories.
  • Use /api/estimate before kicking off the full 10k-row run so finance can sign off.
  • Cache scrape results keyed by URL for 7 days — re-runs become free.

Try this pattern yourself with a free $0.50 trial:

curl -X POST https://www.skillboss.co/api/try/anonymous-wallet \
  -H "Content-Type: application/json" -d '{}'

Get started ->

See also: /use/firecrawl-scrape | /use/claude-4-5-haiku


SkillBoss is an independent multi-provider gateway, not affiliated with any vendor listed. Trademarks belong to their owners. See our IP policy.