Skip to content

Getting Started

The AI Gateway is a drop-in OpenAI-compatible proxy that routes your requests across multiple free-tier AI providers. Point your existing OpenAI SDK or curl commands at the gateway and get resilient, load-balanced inference without managing individual provider keys.

Supported providers:

  • Cloudflare Workers AI — fast inference at the edge
  • Groq — ultra-low latency with LPU hardware
  • Google Gemini — large context windows, multimodal, tool calling
  • OpenRouter — aggregated access to many free models
  • Cerebras — high-throughput open-weight models
  • SambaNova — free tier with Llama 70B, DeepSeek V3, Qwen3
  • NVIDIA NIM — free tier with large model catalog
  • Voyage AI — purpose-built embedding models
  • Health-aware routing — the gateway tracks provider health and automatically skips degraded or rate-limited backends
  • Auto round-robin — when model is set to "auto", requests are distributed across healthy providers
  • Capability-based filtering — automatically routes to models that support the features you need:
    • Send tools → only routes to models with tool/function calling
    • Send response_format: { type: "json_object" } → only routes to models with JSON mode
    • Send images in messages → only routes to vision-capable models
    • Large prompts → automatically excludes models with insufficient context windows
  • Streaming — full server-sent events (SSE) support for all chat endpoints
  • Embeddings — unified /v1/embeddings endpoint backed by Workers AI, Gemini, and Voyage AI
  • Voice — speech-to-text via Groq Whisper and speech-to-speech (STT + LLM + TTS) with Workers AI
  • Analytics — per-request logging with a dashboard at /usage
  • OpenAI-compatible — works with any client that supports the OpenAI API format, including agent frameworks like LangChain, CrewAI, and Vercel AI SDK

Before your first request, verify the gateway is live and see which providers are healthy — no API key needed:

Terminal window
curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/routing/status

You’ll get a JSON list of active providers with their current latency, headroom, and cooldown state. Providers with "degraded": false are ready to route to. If a provider is rate-limited, the gateway auto-skips it and tries the next one.

This gateway is best-effort — it aggregates each upstream’s free tier rather than offering an SLA. Plan for these limits up front:

  • Per-IP gateway limit: ~10 requests burst, ~20 requests/minute sustained. Exceeding this returns 429; back off and retry.
  • Per-model daily caps: each model has its own free-tier daily budget (see the Models page). When a model is out of headroom or hits an upstream rate limit, the gateway cools it down and automatically falls back to the next-best capable model.
  • 503 only when everything is degraded: because the gateway retries across providers, you only see a 503 when every capable model is currently rate-limited. Free-tier windows usually reset within minutes — checking /v1/routing/status will show recovery.
  • model: "auto" is not deterministic: the gateway picks the best healthy model that matches your required capabilities (tools, JSON, vision, context). Pin a specific model ID if your app needs a stable backend.
  • Need production throughput? Bring your own provider API keys and call them directly — the free tiers are intentionally capped so this gateway can stay free for everyone.

Send your first request in under a minute.

Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{ "role": "user", "content": "Hello! What can you do?" }
]
}'