Speech-to-Text
Endpoint
Section titled “Endpoint”POST /v1/audio/transcriptionsTranscribes audio to text with automatic provider fallback. The API is OpenAI-compatible — any client that works with OpenAI’s audio transcriptions API can point to this gateway instead.
Provider Fallback Chain
Section titled “Provider Fallback Chain”The gateway tries each provider in order until one succeeds:
- Groq Whisper (primary) —
whisper-large-v3-turbo,whisper-large-v3. Fastest, best quality on free tier. - Cloudflare Workers AI (fallback) —
@cf/openai/whisper. No external key required. - Gemini audio understanding (last resort) —
gemini-2.5-flashwith audio input. Useful when Groq is rate-limited and you still need transcription.
With model: "auto" (or omitted), the gateway picks the healthiest provider based on live success rate and remaining daily headroom. You can also pin a specific model.
Request
Section titled “Request”multipart/form-data
| Field | Type | Default | Description |
|---|---|---|---|
file | file | required | Audio file to transcribe. Supported formats: mp3, mp4, wav, webm, m4a. |
model | string | whisper-large-v3-turbo | Whisper / audio model. Use auto for health-aware fallback across Groq → Workers AI → Gemini. |
language | string | — | ISO-639-1 language code (e.g. en, es, fr). Improves accuracy when specified. |
Response
Section titled “Response”{ "text": "The transcribed text appears here."}Free Limits
Section titled “Free Limits”- 2,000 requests/day (Groq free tier)
- 8 hours of audio/day (Groq free tier)
Examples
Section titled “Examples”curl https://your-gateway.workers.dev/v1/audio/transcriptions \ -F file=@recording.mp3 \ -F model=whisper-large-v3-turbo \ -F language=enconst formData = new FormData();formData.append('file', audioFile);formData.append('model', 'whisper-large-v3-turbo');
const response = await fetch('https://your-gateway.workers.dev/v1/audio/transcriptions', { method: 'POST', body: formData,});
const { text } = await response.json();console.log(text);import OpenAI from 'openai';import fs from 'fs';
const client = new OpenAI({ baseURL: 'https://your-gateway.workers.dev/v1', apiKey: 'anything',});
const transcription = await client.audio.transcriptions.create({ file: fs.createReadStream('recording.mp3'), model: 'whisper-large-v3-turbo',});
console.log(transcription.text);