LLMTest automatically handles API failures and bad responses so your users never notice when a model goes down. No configuration needed.
When something goes wrong with a model call, the proxy automatically retries with a different model. Your app receives a normal response — same format, same structure — as if nothing happened.
If the model returns a retryable error (429, 500, 502, 503, 504) or doesn't respond within 55 seconds, the proxy retries with an alternative model. Up to 3 attempts.
Fallback models are selected for:
If your request includes response_format: { type: "json_object" }, the proxy validates the response content is actually valid JSON. Here's what happens:
response_format. The proxy detects this and strips the parameter to avoid 400 errors. Your request still goes through.choices[0].message.content is valid JSON."Sure, here's your JSON: {"colors": [...]}), the proxy extracts the clean JSON and returns only that. Your app gets valid JSON without the prose.In short: send response_format: { type: "json_object" } in your request and the proxy guarantees you get valid JSON back — or exhausts all fallbacks trying.
When you send both stream: true and response_format: { type: "json_object" }, the proxy buffers the full stream behind the scenes, validates the assembled content, and then replays the chunks to your client. If validation fails, it retries with a fallback model. Your client still receives a standard SSE stream.
Your app doesn't need to handle fallbacks — the response format is always the same. But if you want to know when one happened (for logging or monitoring), check the response headers:
x-llmtest-fallback-model: google/gemini-2.5-flash # The model that actually served the response. Only present when a fallback occurred. x-llmtest-content-fallback: true # "true" = the fallback was triggered by invalid JSON content. # absent or "false" = it was triggered by an HTTP error or timeout.
You can simulate failures to verify your app handles fallbacks correctly. Add these query parameters to your proxy URL:
curl -X POST "https://llmtest.io/v1/chat/completions?test_fallback=true" \
-H "Authorization: Bearer your-llmt_-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Say hello"}]
}'
# The first model attempt returns a 503. The proxy retries with a fallback.
# Check the x-llmtest-fallback-model header in the response.curl -X POST "https://llmtest.io/v1/chat/completions?test_timeout=true" \
-H "Authorization: Bearer your-llmt_-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Say hello"}]
}'curl -X POST "https://llmtest.io/v1/chat/completions?test_bad_json=true" \
-H "Authorization: Bearer your-llmt_-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Return a JSON object with 3 colors"}],
"response_format": {"type": "json_object"}
}'
# The first model returns invalid JSON. The proxy detects it and retries.
# Response will have x-llmtest-content-fallback: true