Provider Stats
Endpoint
Section titled “Endpoint”GET /v1/stats/providersAuthentication: None — public.
Returns aggregated per-provider telemetry computed from the gateway’s in-memory health state. This is the raw signal used by model: "auto" routing: success rates, throttle frequency, and cooldown activity. Use it to build your own routing intelligence, health dashboards, or provider-ranking heuristics.
Response
Section titled “Response”{ "stats": [ { "provider": "groq", "total_models": 12, "active_models": 11, "total_attempts": 8421, "throttle_count": 312, "throttle_rate": 0.037, "success_rate": 0.958, "avg_latency_ms": 612, "cooldown_events": 14, "models_in_cooldown": 1, "failure_breakdown": { "safety_refusal": 4, "usage_retriable": 298, "input_nonretriable": 7, "provider_fatal": 3 }, "avg_attempts_before_first_throttle": 84.2, "throttle_spacing_p50": 142000 } ]}Per-Provider Fields
Section titled “Per-Provider Fields”| Field | Type | Description |
|---|---|---|
provider | string | Provider slug (groq, gemini, together, workers_ai, openrouter, cerebras, sambanova, nvidia, voyage, pollinations, cohere, github, mistral). |
total_models | number | All models configured for this provider. |
active_models | number | Models currently enabled (not disabled in config). |
total_attempts | number | Rolling attempt count across all models of this provider. |
throttle_count | number | Requests that hit a retriable rate-limit / usage_retriable failure. |
throttle_rate | number | throttle_count / total_attempts. Lower is better. |
success_rate | number | (total_attempts − failures) / total_attempts. Range 0–1. |
avg_latency_ms | number | Rolling average upstream latency across this provider’s models. |
cooldown_events | number | Number of distinct times a model from this provider was placed into cooldown. |
models_in_cooldown | number | How many of this provider’s models are in cooldown right now. |
failure_breakdown.safety_refusal | number | Non-retriable safety refusals (content policy rejections). |
failure_breakdown.usage_retriable | number | Rate-limit / quota errors eligible for retry with another provider. |
failure_breakdown.input_nonretriable | number | Client-side errors (bad schema, too many tokens) — do not retry. |
failure_breakdown.provider_fatal | number | Upstream 5xx / network failures. |
avg_attempts_before_first_throttle | number | null | Across every model of this provider, mean number of attempts before its first throttle event. Higher = more headroom. |
throttle_spacing_p50 | number | null | Median milliseconds between consecutive throttle events for models in this provider. Higher = throttles are rare. |
Using This For Routing Intelligence
Section titled “Using This For Routing Intelligence”The gateway’s own auto mode already does health-aware selection, but if you’re building a multi-tenant product and want to add your own policy on top, /v1/stats/providers gives you the live signal.
Rule-of-thumb heuristics
Section titled “Rule-of-thumb heuristics”- Prefer providers with
throttle_rate < 0.05— anything above 5% means you’re frequently hitting rate limits and will pay latency cost on retries. - Avoid
models_in_cooldown > 0unless the provider still has untouched healthy models (active_models > models_in_cooldown). - Weight by
avg_attempts_before_first_throttle— if provider A gets throttled after 500 calls and provider B after 20, route the next 400 calls to A. - Watch
failure_breakdown.provider_fatal— spikes here are upstream incidents, not your fault. Fail over fast.
Client-side example
Section titled “Client-side example”const BASE = 'https://free-ai-gateway.sarthakagrawal927.workers.dev';
async function pickBestProvider() { const res = await fetch(`${BASE}/v1/stats/providers`); const { stats } = await res.json();
return stats .filter((s) => s.active_models - s.models_in_cooldown > 0) .filter((s) => s.throttle_rate < 0.05) .sort((a, b) => b.success_rate - a.success_rate)[0]?.provider;}
const provider = await pickBestProvider();const chat = await fetch(`${BASE}/v1/chat/completions`, { method: 'POST', headers: { 'Authorization': 'Bearer <GATEWAY_API_KEY>', 'x-gateway-force-provider': provider, 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'auto', messages: [{ role: 'user', content: 'Hello' }], project_id: 'my_project', }),});Examples
Section titled “Examples”curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/stats/providersconst res = await fetch('https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/stats/providers');const { stats } = await res.json();
for (const s of stats) { const sr = (s.success_rate * 100).toFixed(1); const tr = (s.throttle_rate * 100).toFixed(1); console.log(`${s.provider}: ${sr}% success, ${tr}% throttle, ${s.models_in_cooldown}/${s.active_models} cooling`);}Related
Section titled “Related”GET /health— per-model health state (success rate, latency, cooldown expiry, daily usage).GET /dashboard— live HTML dashboard consuming this feed.