Chat Completions
Endpoint
Section titled “Endpoint”POST /v1/chat/completionsRequest Body
Section titled “Request Body”| Field | Type | Default | Description |
|---|---|---|---|
model | string | "auto" | Model identifier. Use "auto" for health-aware round-robin across all providers, or specify a model from GET /v1/models. |
messages | array | required | Conversation history. Each item has role, content, and optional name. |
stream | boolean | false | When true, responses are streamed as server-sent events. |
temperature | number | — | Sampling temperature (0–2). Higher values produce more varied output. |
max_tokens | number | — | Maximum number of tokens to generate. |
min_reasoning_level | string | — | Minimum reasoning tier for auto-routing. One of "low", "medium", "high". |
tools | array | — | List of tool/function definitions. When present, the gateway only routes to models that support tool calling. |
tool_choice | string | object | — | Controls tool use: "none", "auto", "required", or { type: "function", function: { name: "..." } }. |
response_format | object | — | Set to { type: "json_object" } for structured JSON output. Gateway only routes to models with JSON mode support. |
project_id | string | required* | Project tag for analytics and rate accounting. You can also send x-gateway-project-id; one of the two is required. |
Message Object
Section titled “Message Object”Messages support both text and multimodal (vision) content:
{ "role": "user", "content": "What is the capital of France?", "name": "alice"}For vision requests, use the array content format with image URLs:
{ "role": "user", "content": [ { "type": "text", "text": "What's in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } } ]}When the gateway detects image_url content parts, it automatically filters to vision-capable models (e.g. Gemini, Groq Llama 4).
| Field | Type | Description |
|---|---|---|
role | string | One of "system", "user", "assistant", "tool". |
content | string | array | Message text, or array of content parts for multimodal input. |
name | string | Optional display name for the message author. |
Capability-Based Routing
Section titled “Capability-Based Routing”The gateway automatically detects what capabilities your request needs and filters models accordingly:
| Request Feature | Detected Capability | Effect |
|---|---|---|
tools array present | Tool calling | Only routes to models that support function calling |
response_format: { type: "json_object" } | JSON mode | Only routes to models with structured output |
image_url in message content | Vision | Only routes to vision-capable models |
| Large prompt | Context window | Excludes models whose context window is too small |
This is fully automatic — no extra headers or configuration needed.
Non-Streaming Response
Section titled “Non-Streaming Response”A standard OpenAI-compatible response object is returned. The gateway adds extra diagnostic headers:
| Header | Description |
|---|---|
x-gateway-provider | The backend provider that served the request (e.g. groq, gemini). |
x-gateway-model | The exact model used by the provider. |
x-gateway-attempts | Number of provider attempts before a successful response. |
x-gateway-request-id | Unique request identifier for support and log correlation. |
x-gateway-reasoning-effort | The reasoning effort level applied to the request. |
Example Response
Section titled “Example Response”{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1714000000, "model": "llama-3.3-70b-versatile", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 9, "total_tokens": 24 }}Streaming Response
Section titled “Streaming Response”When stream: true, the response is a stream of data: lines in SSE format, each containing a JSON delta object. The stream is terminated by data: [DONE].
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]Examples
Section titled “Examples”Non-Streaming
Section titled “Non-Streaming”curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the capital of France?" } ], "temperature": 0.7, "max_tokens": 256 }'const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer <GATEWAY_API_KEY>', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'auto', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is the capital of France?' }, ], temperature: 0.7, max_tokens: 256, }),});
const data = await response.json();console.log(data.choices[0].message.content);// Inspect gateway headersconsole.log('Provider:', response.headers.get('x-gateway-provider'));console.log('Model:', response.headers.get('x-gateway-model'));Streaming
Section titled “Streaming”curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [{ "role": "user", "content": "Tell me a short story." }], "stream": true }'const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer <GATEWAY_API_KEY>', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'auto', messages: [{ role: 'user', content: 'Tell me a short story.' }], stream: true, }),});
const reader = response.body.getReader();const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value); for (const line of chunk.split('\n')) { if (!line.startsWith('data: ')) continue; const payload = line.slice(6).trim(); if (payload === '[DONE]') break;
const delta = JSON.parse(payload); const text = delta.choices?.[0]?.delta?.content ?? ''; process.stdout.write(text); }}With Reasoning Effort
Section titled “With Reasoning Effort”curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [{ "role": "user", "content": "Solve: x^2 - 5x + 6 = 0" }], "min_reasoning_level": "high" }'const response = await fetch('https://your-gateway.workers.dev/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer <GATEWAY_API_KEY>', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'auto', messages: [{ role: 'user', content: 'Solve: x^2 - 5x + 6 = 0' }], min_reasoning_level: 'high', }),});
const data = await response.json();console.log(data.choices[0].message.content);Tool Calling (Agentic)
Section titled “Tool Calling (Agentic)”When you include tools, the gateway automatically routes to a model that supports function calling (Groq, Gemini, SambaNova, NVIDIA, Cerebras, or OpenRouter models with tool support).
curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string" } }, "required": ["location"] } } }], "tool_choice": "auto" }'import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'https://your-gateway.workers.dev/v1', apiKey: '<GATEWAY_API_KEY>',});
const completion = await client.chat.completions.create({ model: 'auto', extra_body: { project_id: 'my_project' }, messages: [{ role: 'user', content: 'What is the weather in San Francisco?' }], tools: [{ type: 'function', function: { name: 'get_weather', description: 'Get current weather for a location', parameters: { type: 'object', properties: { location: { type: 'string' } }, required: ['location'], }, }, }], tool_choice: 'auto',});
const toolCall = completion.choices[0].message.tool_calls?.[0];console.log(toolCall?.function.name, toolCall?.function.arguments);JSON Mode (Structured Output)
Section titled “JSON Mode (Structured Output)”When you set response_format, the gateway only picks models that support JSON mode.
curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [{ "role": "user", "content": "List 3 programming languages with their year of creation as JSON" }], "response_format": { "type": "json_object" } }'const completion = await client.chat.completions.create({ model: 'auto', extra_body: { project_id: 'my_project' }, messages: [{ role: 'user', content: 'List 3 programming languages with their year of creation as JSON' }], response_format: { type: 'json_object' },});
const parsed = JSON.parse(completion.choices[0].message.content);console.log(parsed);Vision (Image Input)
Section titled “Vision (Image Input)”Send images using the multimodal content format. The gateway auto-detects images and routes to vision-capable models (Gemini, Groq Llama 4).
curl https://your-gateway.workers.dev/v1/chat/completions \ -H "Authorization: Bearer <GATEWAY_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "project_id": "my_project", "messages": [{ "role": "user", "content": [ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg" } } ] }] }'const completion = await client.chat.completions.create({ model: 'auto', extra_body: { project_id: 'my_project' }, messages: [{ role: 'user', content: [ { type: 'text', text: 'Describe this image in detail.' }, { type: 'image_url', image_url: { url: 'https://example.com/photo.jpg' } }, ], }],});
console.log(completion.choices[0].message.content);