Documentation

Build a Voice-to-Article Generator

Record your voice, get a polished article. Automatic transcription, AI editing, and text-to-speech for proofreading.

Record your voice, get a polished article. Automatic transcription, AI editing, and text-to-speech for proofreading.

What You'll Build

A web app that converts voice recordings into well-structured articles with automatic editing and AI-generated voice playback.

Why This Is Powerful

  • Turn thoughts into articles instantly
  • Perfect for content creators
  • Complete audio → text → audio pipeline
  • Production-ready quality

Prerequisites

  • SkillBoss account
  • Next.js knowledge
  • Browser audio permissions

Architecture

Input: Audio recording Skills: openai/whisper-1, openai/gpt-4o, elevenlabs/eleven-turbo-v2 Output: Polished article + AI voiceover


Step 1: Setup Project

Create Next.js app and install SkillBoss client:

npx create-next-app@latest voice-article --typescript
cd voice-article
npm install openai

Step 2: Create Transcription API

Convert audio to text with Whisper:

// app/api/transcribe/route.ts
import { OpenAI } from 'openai'

const client = new OpenAI({
  baseURL: 'https://api.skillboss.co/v1',
  apiKey: process.env.SKILLBOSS_API_KEY,
})

export async function POST(req: Request) {
  const formData = await req.formData()
  const audio = formData.get('audio') as File

  const transcription = await client.audio.transcriptions.create({
    file: audio,
    model: 'openai/whisper-1',
  })

  return Response.json({ text: transcription.text })
}

Step 3: Create Article Polish API

Clean up and structure with GPT-4:

// app/api/polish/route.ts
export async function POST(req: Request) {
  const { rawText } = await req.json()

  const polished = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [{
      role: 'system',
      content: 'Convert this raw transcript into a well-structured article. Add headings, fix grammar, improve flow. Keep the author voice.'
    }, {
      role: 'user',
      content: rawText
    }],
  })

  return Response.json({
    article: polished.choices[0].message.content
  })
}

Step 4: Create Text-to-Speech API

Generate audio playback with ElevenLabs:

// app/api/tts/route.ts
export async function POST(req: Request) {
  const { text } = await req.json()

  const audio = await client.audio.speech.create({
    model: 'elevenlabs/eleven-turbo-v2',
    voice: 'alloy',
    input: text,
  })

  const buffer = Buffer.from(await audio.arrayBuffer())

  return new Response(buffer, {
    headers: { 'Content-Type': 'audio/mpeg' }
  })
}

Step 5: Build UI

Voice recorder + article display:

// app/page.tsx
'use client'
import { useState } from 'react'

export default function VoiceArticle() {
  const [recording, setRecording] = useState(false)
  const [article, setArticle] = useState('')
  const [audioUrl, setAudioUrl] = useState('')

  const record = async () => {
    // Implement browser audio recording
    // Then upload to /api/transcribe
    // Then polish with /api/polish
    // Then generate TTS with /api/tts
  }

  return (
    <div className="max-w-3xl mx-auto p-8">
      <h1 className="text-3xl font-bold mb-8">Voice to Article</h1>

      <button
        onClick={record}
        className="bg-red-500 text-white rounded-full w-20 h-20 mb-8"
      >
        {recording ? '⏸️' : '🎤'}
      </button>

      {article && (
        <div className="prose lg:prose-xl">
          <div dangerouslySetInnerHTML={{ __html: article }} />

          <audio src={audioUrl} controls className="w-full mt-8" />
        </div>
      )}
    </div>
  )
}

Related Tutorials