Drop-in OpenAI-compatible gateway across 8 free providers — Groq, Gemini, Cerebras, SambaNova, NVIDIA, OpenRouter, Voyage, and Workers AI. Auto-failover when one provider throttles.
Routes across all 8 providers — Groq, Gemini, SambaNova, NVIDIA, Cerebras, OpenRouter, Voyage, and Workers AI — with health-aware fallback.
Tracks success rates, latency, and cooldowns per model. Unhealthy providers are automatically skipped.
Full Server-Sent Events streaming for chat completions and responses API. Drop-in OpenAI SDK compatible.
Generate embeddings via Workers AI, Gemini, or Voyage AI. Standard OpenAI embeddings format.
Request logging, per-provider stats, and a live usage dashboard. Monitor everything from /v1/analytics.
IP-based rate limiting on public endpoints. Per-provider daily limits with configurable thresholds.
Point your existing OpenAI SDK or curl commands at the gateway.
Set model: "auto"
and the gateway handles provider selection, retries, and failover.
Heads up: this aggregates free provider tiers — best-effort, no SLA.
Per-IP limit is ~10 burst / ~20 rpm, each model has a daily cap, and the gateway returns 503 only when every capable model is rate-limited — rate-limit windows reset within minutes.
For production throughput, pin a model and bring your own keys.