LLMTest

Fallbacks

LLMTest automatically handles API failures and bad responses so your users never notice when a model goes down. No configuration needed.

Nothing to set up. Fallbacks are always on. If you're routing through the proxy, your app already has automatic failover and JSON recovery.

How it works

When something goes wrong with a model call, the proxy automatically retries with a different model. Your app receives a normal response — same format, same structure — as if nothing happened.

When a model is down

If the model returns a retryable error (429, 500, 502, 503, 504) or doesn't respond within 55 seconds, the proxy retries with an alternative model. Up to 3 attempts.

Fallback models are selected for:

When JSON is broken

If your request includes response_format: { type: "json_object" }, the proxy validates the response content is actually valid JSON. Here's what happens:

  1. Model compatibility check — some models don't support response_format. The proxy detects this and strips the parameter to avoid 400 errors. Your request still goes through.
  2. Content validation — after receiving the response, the proxy checks that choices[0].message.content is valid JSON.
  3. Auto-extraction — if the model wraps JSON in prose (e.g. "Sure, here's your JSON: {"colors": [...]}), the proxy extracts the clean JSON and returns only that. Your app gets valid JSON without the prose.
  4. Fallback — if the content is not valid JSON and can't be extracted, the proxy retries with a different model.

In short: send response_format: { type: "json_object" } in your request and the proxy guarantees you get valid JSON back — or exhausts all fallbacks trying.

Streaming + JSON

When you send both stream: true and response_format: { type: "json_object" }, the proxy buffers the full stream behind the scenes, validates the assembled content, and then replays the chunks to your client. If validation fails, it retries with a fallback model. Your client still receives a standard SSE stream.

Detecting fallbacks in your code

Your app doesn't need to handle fallbacks — the response format is always the same. But if you want to know when one happened (for logging or monitoring), check the response headers:

x-llmtest-fallback-model: google/gemini-2.5-flash
# The model that actually served the response. Only present when a fallback occurred.

x-llmtest-content-fallback: true
# "true" = the fallback was triggered by invalid JSON content.
# absent or "false" = it was triggered by an HTTP error or timeout.

Testing fallbacks

You can simulate failures to verify your app handles fallbacks correctly. Add these query parameters to your proxy URL:

Simulate a model outage

curl -X POST "https://llmtest.io/v1/chat/completions?test_fallback=true" \
  -H "Authorization: Bearer your-llmt_-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

# The first model attempt returns a 503. The proxy retries with a fallback.
# Check the x-llmtest-fallback-model header in the response.

Simulate a timeout

curl -X POST "https://llmtest.io/v1/chat/completions?test_timeout=true" \
  -H "Authorization: Bearer your-llmt_-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Simulate bad JSON

curl -X POST "https://llmtest.io/v1/chat/completions?test_bad_json=true" \
  -H "Authorization: Bearer your-llmt_-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Return a JSON object with 3 colors"}],
    "response_format": {"type": "json_object"}
  }'

# The first model returns invalid JSON. The proxy detects it and retries.
# Response will have x-llmtest-content-fallback: true