Getting Started
What is the AI Gateway?
Section titled “What is the AI Gateway?”The AI Gateway is a drop-in OpenAI-compatible proxy that routes your requests across multiple free-tier AI providers. Point your existing OpenAI SDK or curl commands at the gateway and get resilient, load-balanced inference without managing individual provider keys.
Supported providers:
- Cloudflare Workers AI — fast inference at the edge
- Groq — ultra-low latency with LPU hardware
- Google Gemini — large context windows, multimodal, tool calling
- OpenRouter — aggregated access to many free models
- Cerebras — high-throughput open-weight models
- SambaNova — free tier with Llama 70B, DeepSeek V3, Qwen3
- NVIDIA NIM — free tier with large model catalog
- Voyage AI — purpose-built embedding models
Key Features
Section titled “Key Features”- Health-aware routing — the gateway tracks provider health and automatically skips degraded or rate-limited backends
- Auto round-robin — when model is set to
"auto", requests are distributed across healthy providers - Capability-based filtering — automatically routes to models that support the features you need:
- Send
tools→ only routes to models with tool/function calling - Send
response_format: { type: "json_object" }→ only routes to models with JSON mode - Send images in messages → only routes to vision-capable models
- Large prompts → automatically excludes models with insufficient context windows
- Send
- Streaming — full server-sent events (SSE) support for all chat endpoints
- Embeddings — unified
/v1/embeddingsendpoint backed by Workers AI, Gemini, and Voyage AI - Voice — speech-to-text via Groq Whisper and speech-to-speech (STT + LLM + TTS) with Workers AI
- Analytics — per-request logging with a dashboard at
/usage - OpenAI-compatible — works with any client that supports the OpenAI API format, including agent frameworks like LangChain, CrewAI, and Vercel AI SDK
Check Provider Status
Section titled “Check Provider Status”Before your first request, verify the gateway is live and see which providers are healthy — no API key needed:
curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/routing/statusYou’ll get a JSON list of active providers with their current latency, headroom, and cooldown state. Providers with "degraded": false are ready to route to. If a provider is rate-limited, the gateway auto-skips it and tries the next one.
What to expect on the free tier
Section titled “What to expect on the free tier”This gateway is best-effort — it aggregates each upstream’s free tier rather than offering an SLA. Plan for these limits up front:
- Per-IP gateway limit: ~10 requests burst, ~20 requests/minute sustained. Exceeding this returns
429; back off and retry. - Per-model daily caps: each model has its own free-tier daily budget (see the Models page). When a model is out of headroom or hits an upstream rate limit, the gateway cools it down and automatically falls back to the next-best capable model.
503only when everything is degraded: because the gateway retries across providers, you only see a503when every capable model is currently rate-limited. Free-tier windows usually reset within minutes — checking/v1/routing/statuswill show recovery.model: "auto"is not deterministic: the gateway picks the best healthy model that matches your required capabilities (tools, JSON, vision, context). Pin a specificmodelID if your app needs a stable backend.- Need production throughput? Bring your own provider API keys and call them directly — the free tiers are intentionally capped so this gateway can stay free for everyone.
Quick Start
Section titled “Quick Start”Send your first request in under a minute.
curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [ { "role": "user", "content": "Hello! What can you do?" } ] }'const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer <GATEWAY_API_KEY>', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'auto', messages: [ { role: 'user', content: 'Hello! What can you do?' }, ], }),});
const data = await response.json();console.log(data.choices[0].message.content);import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'https://your-gateway.workers.dev/v1', apiKey: '<GATEWAY_API_KEY>',});
const completion = await client.chat.completions.create({ model: 'auto', extra_body: { project_id: 'my_project' }, messages: [{ role: 'user', content: 'Hello! What can you do?' }],});
console.log(completion.choices[0].message.content);Next Steps
Section titled “Next Steps”- Authentication — learn how to obtain and use your API key
- Chat Completions — full reference for the chat endpoint
- Embeddings — generate vector embeddings
- Speech-to-Text — transcribe audio with Groq Whisper
- Speech-to-Speech — full voice pipeline (audio in → audio out)
- Models — list available models and their capabilities