Documentation
Build a Voice-to-Article Generator
Record your voice, get a polished article. Automatic transcription, AI editing, and text-to-speech for proofreading.
Record your voice, get a polished article. Automatic transcription, AI editing, and text-to-speech for proofreading.
What You'll Build
A web app that converts voice recordings into well-structured articles with automatic editing and AI-generated voice playback.
Why This Is Powerful
- Turn thoughts into articles instantly
- Perfect for content creators
- Complete audio → text → audio pipeline
- Production-ready quality
Prerequisites
- SkillBoss account
- Next.js knowledge
- Browser audio permissions
Architecture
Input: Audio recording Skills: openai/whisper-1, openai/gpt-4o, elevenlabs/eleven-turbo-v2 Output: Polished article + AI voiceover
Step 1: Setup Project
Create Next.js app and install SkillBoss client:
npx create-next-app@latest voice-article --typescript
cd voice-article
npm install openai
Step 2: Create Transcription API
Convert audio to text with Whisper:
// app/api/transcribe/route.ts
import { OpenAI } from 'openai'
const client = new OpenAI({
baseURL: 'https://api.skillboss.co/v1',
apiKey: process.env.SKILLBOSS_API_KEY,
})
export async function POST(req: Request) {
const formData = await req.formData()
const audio = formData.get('audio') as File
const transcription = await client.audio.transcriptions.create({
file: audio,
model: 'openai/whisper-1',
})
return Response.json({ text: transcription.text })
}
Step 3: Create Article Polish API
Clean up and structure with GPT-4:
// app/api/polish/route.ts
export async function POST(req: Request) {
const { rawText } = await req.json()
const polished = await client.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{
role: 'system',
content: 'Convert this raw transcript into a well-structured article. Add headings, fix grammar, improve flow. Keep the author voice.'
}, {
role: 'user',
content: rawText
}],
})
return Response.json({
article: polished.choices[0].message.content
})
}
Step 4: Create Text-to-Speech API
Generate audio playback with ElevenLabs:
// app/api/tts/route.ts
export async function POST(req: Request) {
const { text } = await req.json()
const audio = await client.audio.speech.create({
model: 'elevenlabs/eleven-turbo-v2',
voice: 'alloy',
input: text,
})
const buffer = Buffer.from(await audio.arrayBuffer())
return new Response(buffer, {
headers: { 'Content-Type': 'audio/mpeg' }
})
}
Step 5: Build UI
Voice recorder + article display:
// app/page.tsx
'use client'
import { useState } from 'react'
export default function VoiceArticle() {
const [recording, setRecording] = useState(false)
const [article, setArticle] = useState('')
const [audioUrl, setAudioUrl] = useState('')
const record = async () => {
// Implement browser audio recording
// Then upload to /api/transcribe
// Then polish with /api/polish
// Then generate TTS with /api/tts
}
return (
<div className="max-w-3xl mx-auto p-8">
<h1 className="text-3xl font-bold mb-8">Voice to Article</h1>
<button
onClick={record}
className="bg-red-500 text-white rounded-full w-20 h-20 mb-8"
>
{recording ? '⏸️' : '🎤'}
</button>
{article && (
<div className="prose lg:prose-xl">
<div dangerouslySetInnerHTML={{ __html: article }} />
<audio src={audioUrl} controls className="w-full mt-8" />
</div>
)}
</div>
)
}