Getting Started

What is the AI Gateway?

The AI Gateway is a drop-in OpenAI-compatible proxy that routes your requests across multiple free-tier AI providers. Point your existing OpenAI SDK or curl commands at the gateway and get resilient, load-balanced inference without managing individual provider keys.

Supported providers:

Cloudflare Workers AI — fast inference at the edge
Groq — ultra-low latency with LPU hardware
Google Gemini — large context windows, multimodal, tool calling
OpenRouter — aggregated access to many free models
Cerebras — high-throughput open-weight models
SambaNova — free tier with Llama 70B, DeepSeek V3, Qwen3
NVIDIA NIM — free tier with large model catalog
Voyage AI — purpose-built embedding models

Key Features

Health-aware routing — the gateway tracks provider health and automatically skips degraded or rate-limited backends
Auto round-robin — when model is set to "auto", requests are distributed across healthy providers
Capability-based filtering — automatically routes to models that support the features you need:
- Send tools → only routes to models with tool/function calling
- Send response_format: { type: "json_object" } → only routes to models with JSON mode
- Send images in messages → only routes to vision-capable models
- Large prompts → automatically excludes models with insufficient context windows
Streaming — full server-sent events (SSE) support for all chat endpoints
Embeddings — unified /v1/embeddings endpoint backed by Workers AI, Gemini, and Voyage AI
Voice — speech-to-text via Groq Whisper and speech-to-speech (STT + LLM + TTS) with Workers AI
Analytics — per-request logging with a dashboard at /usage
OpenAI-compatible — works with any client that supports the OpenAI API format, including agent frameworks like LangChain, CrewAI, and Vercel AI SDK

Check Provider Status

Before your first request, verify the gateway is live and see which providers are healthy — no API key needed:

curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/routing/status

You’ll get a JSON list of active providers with their current latency, headroom, and cooldown state. Providers with "degraded": false are ready to route to. If a provider is rate-limited, the gateway auto-skips it and tries the next one.

What to expect on the free tier

This gateway is best-effort — it aggregates each upstream’s free tier rather than offering an SLA. Plan for these limits up front:

Per-IP gateway limit: ~10 requests burst, ~20 requests/minute sustained. Exceeding this returns 429; back off and retry.
Per-model daily caps: each model has its own free-tier daily budget (see the Models page). When a model is out of headroom or hits an upstream rate limit, the gateway cools it down and automatically falls back to the next-best capable model.
503 only when everything is degraded: because the gateway retries across providers, you only see a 503 when every capable model is currently rate-limited. Free-tier windows usually reset within minutes — checking /v1/routing/status will show recovery.
model: "auto" is not deterministic: the gateway picks the best healthy model that matches your required capabilities (tools, JSON, vision, context). Pin a specific model ID if your app needs a stable backend.
Need production throughput? Bring your own provider API keys and call them directly — the free tiers are intentionally capped so this gateway can stay free for everyone.

Quick Start

Send your first request in under a minute.

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      { "role": "user", "content": "Hello! What can you do?" }
    ]
  }'

const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <GATEWAY_API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [
      { role: 'user', content: 'Hello! What can you do?' },
    ],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://your-gateway.workers.dev/v1',
  apiKey: '<GATEWAY_API_KEY>',
});

const completion = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'Hello! What can you do?' }],
});

console.log(completion.choices[0].message.content);

Next Steps

Authentication — learn how to obtain and use your API key
Chat Completions — full reference for the chat endpoint
Embeddings — generate vector embeddings
Speech-to-Text — transcribe audio with Groq Whisper
Speech-to-Speech — full voice pipeline (audio in → audio out)
Models — list available models and their capabilities