Skip to content

Speech-to-Text

POST /v1/audio/transcriptions

Transcribes audio to text with automatic provider fallback. The API is OpenAI-compatible — any client that works with OpenAI’s audio transcriptions API can point to this gateway instead.

The gateway tries each provider in order until one succeeds:

  1. Groq Whisper (primary) — whisper-large-v3-turbo, whisper-large-v3. Fastest, best quality on free tier.
  2. Cloudflare Workers AI (fallback) — @cf/openai/whisper. No external key required.
  3. Gemini audio understanding (last resort) — gemini-2.5-flash with audio input. Useful when Groq is rate-limited and you still need transcription.

With model: "auto" (or omitted), the gateway picks the healthiest provider based on live success rate and remaining daily headroom. You can also pin a specific model.

multipart/form-data

FieldTypeDefaultDescription
filefilerequiredAudio file to transcribe. Supported formats: mp3, mp4, wav, webm, m4a.
modelstringwhisper-large-v3-turboWhisper / audio model. Use auto for health-aware fallback across Groq → Workers AI → Gemini.
languagestringISO-639-1 language code (e.g. en, es, fr). Improves accuracy when specified.
{
"text": "The transcribed text appears here."
}
  • 2,000 requests/day (Groq free tier)
  • 8 hours of audio/day (Groq free tier)
Terminal window
curl https://your-gateway.workers.dev/v1/audio/transcriptions \
-F file=@recording.mp3 \
-F model=whisper-large-v3-turbo \
-F language=en