Proxy / API Gateway

Route your LLM calls through LLMTest for automatic fallbacks, JSON recovery, and cost tracking. Compatible with any OpenAI-format SDK.

Zero config required. Fallbacks and JSON recovery work automatically the moment you point your app at LLMTest. No setup, no feature flags, no extra code.

Setup

Set your base URL to https://llmtest.io/v1, use your LLMTest API key, and pick any model from our 340+ supported models. If you already have an existing AI integration, just swap the base URL — everything else stays the same.

const response = await fetch("https://llmtest.io/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-llmt_-key-here",
    "Content-Type": "application/json",
    "X-Flow": "my-feature-name",        // optional: tag calls by feature
  },
  body: JSON.stringify({
    model: "gpt-4o",                    // any model from llmtest.io/models
    messages: [{ role: "user", content: "Hello" }],
  }),
});

// Works with the openai npm package — just change baseURL and apiKey
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-llmt_-key-here",                   // your LLMTest key
  baseURL: "https://llmtest.io/v1",     // LLMTest proxy
  defaultHeaders: {
    "X-Flow": "my-feature-name",
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",                      // any model from llmtest.io/models
  messages: [{ role: "user", content: "Hello" }],
});

# Works with the openai Python package — just change base_url and api_key
from openai import OpenAI

client = OpenAI(
    api_key="your-llmt_-key-here",                  # your LLMTest key
    base_url="https://llmtest.io/v1",    # LLMTest proxy
    default_headers={"X-Flow": "my-feature-name"},
)

response = client.chat.completions.create(
    model="gpt-4o",                      # any model from llmtest.io/models
    messages=[{"role": "user", "content": "Hello"}],
)

curl -X POST "https://llmtest.io/v1/chat/completions" \
  -H "Authorization: Bearer your-llmt_-key-here" \
  -H "Content-Type: application/json" \
  -H "X-Flow: my-feature-name" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Any model, one API. You're not limited to OpenAI models. Use anthropic/claude-sonnet-4, google/gemini-2.5-flash, meta-llama/llama-4-scout, or any of our 340+ models — all through the same endpoint and SDK.

What happens automatically

Once you're routing through LLMTest, you get all of this with no extra code:

340+ models — access models from OpenAI, Anthropic, Google, Meta, Mistral, and more
Automatic failover — if a model is down (429, 500+), your request is retried on another model automatically. Your app never sees the error. Learn more
JSON recovery — if you send response_format: { type: "json_object" } and the model returns broken JSON, the proxy fixes it or retries. Learn more
Cost tracking — every call is logged with token counts, latency, and cost per flow

Detecting fallbacks (optional)

Your app works without checking for fallbacks — the response format is always the same. But if you want to know when a fallback happened (for logging, monitoring, etc.), check these response headers:

x-llmtest-fallback-model string optional

The model that actually served the response (only set when a fallback occurred)

x-llmtest-content-fallback boolean optional

"true" when the fallback was triggered by invalid JSON content (vs an HTTP error)

Example — logging fallbacks:

const response = await fetch("https://llmtest.io/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-llmt_-key-here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o",
    messages: [{ role: "user", content: "List 3 colors" }],
    response_format: { type: "json_object" },
  }),
});

// Check if a fallback was used
const fallbackModel = response.headers.get("x-llmtest-fallback-model");
if (fallbackModel) {
  console.log("Fallback used:", fallbackModel);
  const wasJsonIssue = response.headers.get("x-llmtest-content-fallback") === "true";
  console.log("Reason:", wasJsonIssue ? "bad JSON" : "model down");
}

// Use the response as normal — same format regardless of fallback
const data = await response.json();
const content = JSON.parse(data.choices[0].message.content);

Flows

A flow is a named group of API calls — typically one per feature in your app (e.g. "support-bot", "email-writer", "code-reviewer"). Flows let you track costs, latency, and model usage per feature, and run benchmarks on each one independently.

To create a flow, add the X-Flow header to your requests:

"X-Flow": "support-bot"    // tag this call as part of the "support-bot" flow

Flows appear on your dashboard automatically the first time a call with that flow name hits the proxy. If you see "No flows yet" on your dashboard, it means no proxy calls have been made yet — send your first request and the flow will appear.

If you omit the X-Flow header, calls are grouped under a default unknown flow. This still works, but you lose the ability to track costs per feature or benchmark individual flows.

Naming tips: Use short, descriptive names. Lowercase with hyphens works well. Examples: support-bot, invoice-parser, content-moderation.

Supported endpoints

Currently supported:

POST /v1/chat/completions — chat completions (streaming and non-streaming)

The request and response formats follow the OpenAI chat completions standard. Any SDK or HTTP client that supports this format works — including the official OpenAI, Anthropic, and most open-source SDKs.