Skip to content

Provider Stats

GET /v1/stats/providers

Authentication: None — public.

Returns aggregated per-provider telemetry computed from the gateway’s in-memory health state. This is the raw signal used by model: "auto" routing: success rates, throttle frequency, and cooldown activity. Use it to build your own routing intelligence, health dashboards, or provider-ranking heuristics.

{
"stats": [
{
"provider": "groq",
"total_models": 12,
"active_models": 11,
"total_attempts": 8421,
"throttle_count": 312,
"throttle_rate": 0.037,
"success_rate": 0.958,
"avg_latency_ms": 612,
"cooldown_events": 14,
"models_in_cooldown": 1,
"failure_breakdown": {
"safety_refusal": 4,
"usage_retriable": 298,
"input_nonretriable": 7,
"provider_fatal": 3
},
"avg_attempts_before_first_throttle": 84.2,
"throttle_spacing_p50": 142000
}
]
}
FieldTypeDescription
providerstringProvider slug (groq, gemini, together, workers_ai, openrouter, cerebras, sambanova, nvidia, voyage, pollinations, cohere, github, mistral).
total_modelsnumberAll models configured for this provider.
active_modelsnumberModels currently enabled (not disabled in config).
total_attemptsnumberRolling attempt count across all models of this provider.
throttle_countnumberRequests that hit a retriable rate-limit / usage_retriable failure.
throttle_ratenumberthrottle_count / total_attempts. Lower is better.
success_ratenumber(total_attempts − failures) / total_attempts. Range 0–1.
avg_latency_msnumberRolling average upstream latency across this provider’s models.
cooldown_eventsnumberNumber of distinct times a model from this provider was placed into cooldown.
models_in_cooldownnumberHow many of this provider’s models are in cooldown right now.
failure_breakdown.safety_refusalnumberNon-retriable safety refusals (content policy rejections).
failure_breakdown.usage_retriablenumberRate-limit / quota errors eligible for retry with another provider.
failure_breakdown.input_nonretriablenumberClient-side errors (bad schema, too many tokens) — do not retry.
failure_breakdown.provider_fatalnumberUpstream 5xx / network failures.
avg_attempts_before_first_throttlenumber | nullAcross every model of this provider, mean number of attempts before its first throttle event. Higher = more headroom.
throttle_spacing_p50number | nullMedian milliseconds between consecutive throttle events for models in this provider. Higher = throttles are rare.

The gateway’s own auto mode already does health-aware selection, but if you’re building a multi-tenant product and want to add your own policy on top, /v1/stats/providers gives you the live signal.

  • Prefer providers with throttle_rate < 0.05 — anything above 5% means you’re frequently hitting rate limits and will pay latency cost on retries.
  • Avoid models_in_cooldown > 0 unless the provider still has untouched healthy models (active_models > models_in_cooldown).
  • Weight by avg_attempts_before_first_throttle — if provider A gets throttled after 500 calls and provider B after 20, route the next 400 calls to A.
  • Watch failure_breakdown.provider_fatal — spikes here are upstream incidents, not your fault. Fail over fast.
const BASE = 'https://free-ai-gateway.sarthakagrawal927.workers.dev';
async function pickBestProvider() {
const res = await fetch(`${BASE}/v1/stats/providers`);
const { stats } = await res.json();
return stats
.filter((s) => s.active_models - s.models_in_cooldown > 0)
.filter((s) => s.throttle_rate < 0.05)
.sort((a, b) => b.success_rate - a.success_rate)[0]?.provider;
}
const provider = await pickBestProvider();
const chat = await fetch(`${BASE}/v1/chat/completions`, {
method: 'POST',
headers: {
'Authorization': 'Bearer <GATEWAY_API_KEY>',
'x-gateway-force-provider': provider,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'auto',
messages: [{ role: 'user', content: 'Hello' }],
project_id: 'my_project',
}),
});
Terminal window
curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/stats/providers
  • GET /health — per-model health state (success rate, latency, cooldown expiry, daily usage).
  • GET /dashboard — live HTML dashboard consuming this feed.