Provider Stats

Endpoint

GET /v1/stats/providers

Authentication: None — public.

Returns aggregated per-provider telemetry computed from the gateway’s in-memory health state. This is the raw signal used by model: "auto" routing: success rates, throttle frequency, and cooldown activity. Use it to build your own routing intelligence, health dashboards, or provider-ranking heuristics.

Response

{
  "stats": [
    {
      "provider": "groq",
      "total_models": 12,
      "active_models": 11,
      "total_attempts": 8421,
      "throttle_count": 312,
      "throttle_rate": 0.037,
      "success_rate": 0.958,
      "avg_latency_ms": 612,
      "cooldown_events": 14,
      "models_in_cooldown": 1,
      "failure_breakdown": {
        "safety_refusal": 4,
        "usage_retriable": 298,
        "input_nonretriable": 7,
        "provider_fatal": 3
      },
      "avg_attempts_before_first_throttle": 84.2,
      "throttle_spacing_p50": 142000
    }
  ]
}

Per-Provider Fields

Field	Type	Description
`provider`	string	Provider slug (`groq`, `gemini`, `together`, `workers_ai`, `openrouter`, `cerebras`, `sambanova`, `nvidia`, `voyage`, `pollinations`, `cohere`, `github`, `mistral`).
`total_models`	number	All models configured for this provider.
`active_models`	number	Models currently enabled (not disabled in config).
`total_attempts`	number	Rolling attempt count across all models of this provider.
`throttle_count`	number	Requests that hit a retriable rate-limit / `usage_retriable` failure.
`throttle_rate`	number	`throttle_count / total_attempts`. Lower is better.
`success_rate`	number	`(total_attempts − failures) / total_attempts`. Range 0–1.
`avg_latency_ms`	number	Rolling average upstream latency across this provider’s models.
`cooldown_events`	number	Number of distinct times a model from this provider was placed into cooldown.
`models_in_cooldown`	number	How many of this provider’s models are in cooldown right now.
`failure_breakdown.safety_refusal`	number	Non-retriable safety refusals (content policy rejections).
`failure_breakdown.usage_retriable`	number	Rate-limit / quota errors eligible for retry with another provider.
`failure_breakdown.input_nonretriable`	number	Client-side errors (bad schema, too many tokens) — do not retry.
`failure_breakdown.provider_fatal`	number	Upstream 5xx / network failures.
`avg_attempts_before_first_throttle`	number \| null	Across every model of this provider, mean number of attempts before its first throttle event. Higher = more headroom.
`throttle_spacing_p50`	number \| null	Median milliseconds between consecutive throttle events for models in this provider. Higher = throttles are rare.

Using This For Routing Intelligence

The gateway’s own auto mode already does health-aware selection, but if you’re building a multi-tenant product and want to add your own policy on top, /v1/stats/providers gives you the live signal.

Rule-of-thumb heuristics

Prefer providers with throttle_rate < 0.05 — anything above 5% means you’re frequently hitting rate limits and will pay latency cost on retries.
Avoid models_in_cooldown > 0 unless the provider still has untouched healthy models (active_models > models_in_cooldown).
Weight by avg_attempts_before_first_throttle — if provider A gets throttled after 500 calls and provider B after 20, route the next 400 calls to A.
Watch failure_breakdown.provider_fatal — spikes here are upstream incidents, not your fault. Fail over fast.

Client-side example

const BASE = 'https://free-ai-gateway.sarthakagrawal927.workers.dev';

async function pickBestProvider() {
  const res = await fetch(`${BASE}/v1/stats/providers`);
  const { stats } = await res.json();

  return stats
    .filter((s) => s.active_models - s.models_in_cooldown > 0)
    .filter((s) => s.throttle_rate < 0.05)
    .sort((a, b) => b.success_rate - a.success_rate)[0]?.provider;
}

const provider = await pickBestProvider();
const chat = await fetch(`${BASE}/v1/chat/completions`, {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <GATEWAY_API_KEY>',
    'x-gateway-force-provider': provider,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [{ role: 'user', content: 'Hello' }],
    project_id: 'my_project',
  }),
});

curl https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/stats/providers

const res = await fetch('https://free-ai-gateway.sarthakagrawal927.workers.dev/v1/stats/providers');
const { stats } = await res.json();

for (const s of stats) {
  const sr = (s.success_rate * 100).toFixed(1);
  const tr = (s.throttle_rate * 100).toFixed(1);
  console.log(`${s.provider}: ${sr}% success, ${tr}% throttle, ${s.models_in_cooldown}/${s.active_models} cooling`);
}

GET /health — per-model health state (success rate, latency, cooldown expiry, daily usage).
GET /dashboard — live HTML dashboard consuming this feed.