How to set up OpenRouter fallback in Node.js in 2026

Week one with OpenRouter goes fine. You pick a model, wire it up, ship the feature. Week two: a 429 at 2am, your primary model is rate-limited, and every call to your app returns an error until you wake up and switch models manually. The fix is one body parameter, but there are four things about it that will catch you if you don't know them in advance.

Wire up the client

No extra packages required if you're on Node 18+. OpenRouter is OpenAI-compatible, so native fetch hits the same endpoint shape:

const OPENROUTER_KEY = process.env.OPENROUTER_API_KEY;
const BASE_URL = 'https://openrouter.ai/api/v1/chat/completions';

async function chat(messages) {
  const res = await fetch(BASE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${OPENROUTER_KEY}`,
      'HTTP-Referer': 'https://your-app.com',
      'X-Title': 'Your App',
    },
    body: JSON.stringify({
      model: 'anthropic/claude-opus-4-7',
      messages,
    }),
  });
  if (!res.ok) throw new Error(`${res.status}: ${await res.text()}`);
  return res.json();
}

The HTTP-Referer and X-Title headers are optional but worth adding from day one. They appear in your OpenRouter usage dashboard so you can trace which app made which call when you're debugging an unexpected spike.

Add the `models` array

Replace the single model field with a models array. OpenRouter tries each entry in priority order; the first one that returns a non-error response wins:

async function chatWithFallback(messages) {
  const res = await fetch(BASE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${OPENROUTER_KEY}`,
      'HTTP-Referer': 'https://your-app.com',
      'X-Title': 'Your App',
    },
    body: JSON.stringify({
      models: [
        'anthropic/claude-opus-4-7',
        'openai/gpt-4o',
        'openai/gpt-4o-mini',
      ],
      messages,
    }),
  });
  if (!res.ok) throw new Error(`${res.status}: ${await res.text()}`);
  return res.json();
}

OpenRouter triggers the fallback on hard failures: 429 rate limits, 503 unavailability, content filter rejections, and context-length overflows for the requested model. If Claude is throttled at 2am, the call moves to gpt-4o silently, and your users see nothing.

Check which model ran

The response includes a model field showing which entry in your models array actually handled the request. Log it every time: you need it for cost attribution and for knowing when fallbacks are firing:

async function main() {
  const messages = [{ role: 'user', content: 'Summarize this for me.' }];
  const data = await chatWithFallback(messages);

  const modelUsed = data.model; // 'openai/gpt-4o' when claude was rate-limited
  const content = data.choices[0].message.content;

  console.log('Model used:', modelUsed);
}

If modelUsed is anything other than your first choice, something went wrong with the primary. A run of consecutive fallbacks in your logs is a signal worth investigating. It usually means a rate limit you haven't noticed yet or a provider incident that's not reflected in their status page.

Four pitfalls from week two

1. model and models conflict. OpenRouter ignores the singular model field when models is present. That's fine when you write the fetch call yourself. It becomes a problem if you're wrapping an SDK. Some LangChain versions and older LiteLLM configs inject model into the request body regardless of what you pass. The fallback still works, but the extra field makes the request body confusing when you're reading logs. Check what your wrapper does with the raw request.

2. Soft failures don't trigger fallback. If claude-opus-4-7 returns HTTP 200 with an empty string, a malformed JSON fragment, or a content refusal ("I can't help with that"), OpenRouter sees a successful response and does not fall back. That error lands directly in your application. You need to validate the response content before using it and handle the soft-failure case explicitly. The models array is an availability guarantee, not a quality guarantee. The broader fallback chain guide covers soft-failure detection patterns alongside availability fallback.

3. Context window mismatch in the chain. Your fallback chain might be: claude-opus-4-7 (200k context) then gpt-4o (128k) then gpt-4o-mini (128k). If your prompt is 150k tokens, Claude handles it fine. When Claude is unavailable, both fallbacks also fail with a context error: different model, same problem. Design your chain so every entry can handle your P99 prompt size. If you have large-context use cases, start the chain with a large-context model at every position, not just the top.

4. Streaming: model is in the first chunk. When you add stream: true, the model field appears in the opening data: server-sent event, not at the end of the stream. If you aggregate chunks and only inspect the final assembled result, you miss it:

async function* parseStream(response) {
  for await (const chunk of response.body) {
    const text = new TextDecoder().decode(chunk);
    for (const line of text.split('\n')) {
      if (line.startsWith('data: ') && line !== 'data: [DONE]') {
        try { yield JSON.parse(line.slice(6)); } catch {}
      }
    }
  }
}

async function chatStreaming(messages) {
  const res = await fetch(BASE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${OPENROUTER_KEY}`,
    },
    body: JSON.stringify({ models: ['anthropic/claude-opus-4-7', 'openai/gpt-4o'], messages, stream: true }),
  });

  let modelUsed = null;
  for await (const chunk of parseStream(res)) {
    if (!modelUsed && chunk.model) modelUsed = chunk.model;
    // process chunk.choices[0].delta.content here
  }
  console.log('Model used:', modelUsed);
}

Capture model from the first chunk or you lose billing attribution on every streaming call.

When native fallback isn't enough

OpenRouter's models array handles availability. It doesn't handle quality. If your primary model is up but returns a low-quality answer on a specific prompt type, the call succeeds and no fallback triggers.

For quality-gated routing, you need a judge layer on top of availability fallback. LLMTest's fallback docs cover how the proxy adds a lightweight judge pass after each response: if the primary model scores below your configured threshold, it re-routes to the next model and returns the better result. Your call sites stay unchanged when you adjust routing logic in the dashboard.

The rate-limit-specific failover patterns are also worth reading alongside this, specifically the circuit breaker pattern that stops your app from hammering a throttled provider before the models fallback even gets a chance to run.

One model going down should not take your feature with it. The models array is three lines of JSON and covers the incident at 2am. Add quality gates when you can measure where soft failures are costing you.

Route around rate limits and quality failures from one place: get started with LLMTest in about a minute.

Wire up the client

Add the models array

Check which model ran

Four pitfalls from week two

When native fallback isn't enough

Ship LLM features without burning your budget.

Related articles

Add the `models` array