Skip to content

Chat Completions

POST /v1/chat/completions
FieldTypeDefaultDescription
modelstring"auto"Model identifier. Use "auto" for health-aware round-robin across all providers, or specify a model from GET /v1/models.
messagesarrayrequiredConversation history. Each item has role, content, and optional name.
streambooleanfalseWhen true, responses are streamed as server-sent events.
temperaturenumberSampling temperature (0–2). Higher values produce more varied output.
max_tokensnumberMaximum number of tokens to generate.
min_reasoning_levelstringMinimum reasoning tier for auto-routing. One of "low", "medium", "high".
toolsarrayList of tool/function definitions. When present, the gateway only routes to models that support tool calling.
tool_choicestring | objectControls tool use: "none", "auto", "required", or { type: "function", function: { name: "..." } }.
response_formatobjectSet to { type: "json_object" } for structured JSON output. Gateway only routes to models with JSON mode support.
project_idstringrequired*Project tag for analytics and rate accounting. You can also send x-gateway-project-id; one of the two is required.

Messages support both text and multimodal (vision) content:

{
"role": "user",
"content": "What is the capital of France?",
"name": "alice"
}

For vision requests, use the array content format with image URLs:

{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
]
}

When the gateway detects image_url content parts, it automatically filters to vision-capable models (e.g. Gemini, Groq Llama 4).

FieldTypeDescription
rolestringOne of "system", "user", "assistant", "tool".
contentstring | arrayMessage text, or array of content parts for multimodal input.
namestringOptional display name for the message author.

The gateway automatically detects what capabilities your request needs and filters models accordingly:

Request FeatureDetected CapabilityEffect
tools array presentTool callingOnly routes to models that support function calling
response_format: { type: "json_object" }JSON modeOnly routes to models with structured output
image_url in message contentVisionOnly routes to vision-capable models
Large promptContext windowExcludes models whose context window is too small

This is fully automatic — no extra headers or configuration needed.

A standard OpenAI-compatible response object is returned. The gateway adds extra diagnostic headers:

HeaderDescription
x-gateway-providerThe backend provider that served the request (e.g. groq, gemini).
x-gateway-modelThe exact model used by the provider.
x-gateway-attemptsNumber of provider attempts before a successful response.
x-gateway-request-idUnique request identifier for support and log correlation.
x-gateway-reasoning-effortThe reasoning effort level applied to the request.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "llama-3.3-70b-versatile",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 9,
"total_tokens": 24
}
}

When stream: true, the response is a stream of data: lines in SSE format, each containing a JSON delta object. The stream is terminated by data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant","content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1714000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the capital of France?" }
],
"temperature": 0.7,
"max_tokens": 256
}'
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [{ "role": "user", "content": "Tell me a short story." }],
"stream": true
}'
Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [{ "role": "user", "content": "Solve: x^2 - 5x + 6 = 0" }],
"min_reasoning_level": "high"
}'

When you include tools, the gateway automatically routes to a model that supports function calling (Groq, Gemini, SambaNova, NVIDIA, Cerebras, or OpenRouter models with tool support).

Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'

When you set response_format, the gateway only picks models that support JSON mode.

Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [{ "role": "user", "content": "List 3 programming languages with their year of creation as JSON" }],
"response_format": { "type": "json_object" }
}'

Send images using the multimodal content format. The gateway auto-detects images and routes to vision-capable models (Gemini, Groq Llama 4).

Terminal window
curl https://your-gateway.workers.dev/v1/chat/completions \
-H "Authorization: Bearer <GATEWAY_API_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"project_id": "my_project",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg" } }
]
}]
}'