Chat Completions

Endpoint

POST /v1/chat/completions

Request Body

Field	Type	Default	Description
`model`	string	`"auto"`	Model identifier. Use `"auto"` for health-aware round-robin across all providers, or specify a model from `GET /v1/models`.
`messages`	array	required	Conversation history. Each item has `role`, `content`, and optional `name`.
`stream`	boolean	`false`	When `true`, responses are streamed as server-sent events.
`temperature`	number	—	Sampling temperature (0–2). Higher values produce more varied output.
`max_tokens`	number	—	Maximum number of tokens to generate.
`min_reasoning_level`	string	—	Minimum reasoning tier for auto-routing. One of `"low"`, `"medium"`, `"high"`.
`tools`	array	—	List of tool/function definitions. When present, the gateway only routes to models that support tool calling.
`tool_choice`	string \| object	—	Controls tool use: `"none"`, `"auto"`, `"required"`, or `{ type: "function", function: { name: "..." } }`.
`response_format`	object	—	Set to `{ type: "json_object" }` for structured JSON output. Gateway only routes to models with JSON mode support.
`project_id`	string	required*	Project tag for analytics and rate accounting. You can also send `x-gateway-project-id`; one of the two is required.

Message Object

Messages support both text and multimodal (vision) content:

{
  "role": "user",
  "content": "What is the capital of France?",
  "name": "alice"
}

For vision requests, use the array content format with image URLs:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
  ]
}

When the gateway detects image_url content parts, it automatically filters to vision-capable models (e.g. Gemini, Groq Llama 4).

Field	Type	Description
`role`	string	One of `"system"`, `"user"`, `"assistant"`, `"tool"`.
`content`	string \| array	Message text, or array of content parts for multimodal input.
`name`	string	Optional display name for the message author.

Capability-Based Routing

The gateway automatically detects what capabilities your request needs and filters models accordingly:

Request Feature	Detected Capability	Effect
`tools` array present	Tool calling	Only routes to models that support function calling
`response_format: { type: "json_object" }`	JSON mode	Only routes to models with structured output
`image_url` in message content	Vision	Only routes to vision-capable models
Large prompt	Context window	Excludes models whose context window is too small

This is fully automatic — no extra headers or configuration needed.

Non-Streaming Response

A standard OpenAI-compatible response object is returned. The gateway adds extra diagnostic headers:

Header	Description
`x-gateway-provider`	The backend provider that served the request (e.g. `groq`, `gemini`).
`x-gateway-model`	The exact model used by the provider.
`x-gateway-attempts`	Number of provider attempts before a successful response.
`x-gateway-request-id`	Unique request identifier for support and log correlation.
`x-gateway-reasoning-effort`	The reasoning effort level applied to the request.

Example Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama-3.3-70b-versatile",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 9,
    "total_tokens": 24
  }
}

Streaming Response

When stream: true, the response is a stream of data: lines in SSE format, each containing a JSON delta object. The stream is terminated by data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: [DONE]

Examples

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <GATEWAY_API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'What is the capital of France?' },
    ],
    temperature: 0.7,
    max_tokens: 256,
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);
// Inspect gateway headers
console.log('Provider:', response.headers.get('x-gateway-provider'));
console.log('Model:', response.headers.get('x-gateway-model'));

Streaming

curl
JavaScript

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [{ "role": "user", "content": "Tell me a short story." }],
    "stream": true
  }'

const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <GATEWAY_API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [{ role: 'user', content: 'Tell me a short story.' }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  for (const line of chunk.split('\n')) {
    if (!line.startsWith('data: ')) continue;
    const payload = line.slice(6).trim();
    if (payload === '[DONE]') break;

    const delta = JSON.parse(payload);
    const text = delta.choices?.[0]?.delta?.content ?? '';
    process.stdout.write(text);
  }
}

With Reasoning Effort

curl
JavaScript

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [{ "role": "user", "content": "Solve: x^2 - 5x + 6 = 0" }],
    "min_reasoning_level": "high"
  }'

const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <GATEWAY_API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [{ role: 'user', content: 'Solve: x^2 - 5x + 6 = 0' }],
    min_reasoning_level: 'high',
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Tool Calling (Agentic)

When you include tools, the gateway automatically routes to a model that supports function calling (Groq, Gemini, SambaNova, NVIDIA, Cerebras, or OpenRouter models with tool support).

curl
OpenAI SDK

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://your-gateway.workers.dev/v1',
  apiKey: '<GATEWAY_API_KEY>',
});

const completion = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'What is the weather in San Francisco?' }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location'],
      },
    },
  }],
  tool_choice: 'auto',
});

const toolCall = completion.choices[0].message.tool_calls?.[0];
console.log(toolCall?.function.name, toolCall?.function.arguments);

JSON Mode (Structured Output)

When you set response_format, the gateway only picks models that support JSON mode.

curl
OpenAI SDK

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [{ "role": "user", "content": "List 3 programming languages with their year of creation as JSON" }],
    "response_format": { "type": "json_object" }
  }'

const completion = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{ role: 'user', content: 'List 3 programming languages with their year of creation as JSON' }],
  response_format: { type: 'json_object' },
});

const parsed = JSON.parse(completion.choices[0].message.content);
console.log(parsed);

Vision (Image Input)

Send images using the multimodal content format. The gateway auto-detects images and routes to vision-capable models (Gemini, Groq Llama 4).

curl
OpenAI SDK

curl https://your-gateway.workers.dev/v1/chat/completions \
  -H "Authorization: Bearer <GATEWAY_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "project_id": "my_project",
    "messages": [{
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg" } }
      ]
    }]
  }'

const completion = await client.chat.completions.create({
  model: 'auto',
  extra_body: { project_id: 'my_project' },
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Describe this image in detail.' },
      { type: 'image_url', image_url: { url: 'https://example.com/photo.jpg' } },
    ],
  }],
});

console.log(completion.choices[0].message.content);