Failover-Aware Model Fallback NEW
Problem
AI model requests fail for varied and often opaque reasons. Simple retry logic fails to distinguish between:
- Transient failures (timeouts, rate limits) that benefit from retry with backoff
- Semantic failures (auth errors, billing issues) where retry is futile
- User aborts where retry wastes resources and frustrates users
Agents using multiple models or providers need intelligent fallback strategies that respect failure semantics, avoid retry loops, and provide clear diagnostics.
Solution
Semantic error classification with intelligent fallback chains. Each failure is categorized into a specific reason type, and fallback behavior is tailored to that reason. Multi-model fallback chains are configured per-request, with provider-specific allowlists and cooldown tracking.
Core concepts:
- Error classification: Failures are mapped to semantic reason types (
timeout,rate_limit,auth,billing,format,context_overflow). - Reason-aware fallback: Different reasons trigger different fallback behaviors:
timeout,rate_limit: Retry with next model in chainauth,billing: Fail immediately; retry won't helpformat,context_overflow: May retry with adjusted request- User abort detection: Distinguishes user-initiated aborts from timeout-induced aborts. User aborts rethrow immediately; timeouts trigger fallback.
- Multi-model chains: Ordered list of
{provider, model}candidates. Each attempt uses the next candidate until success or exhaustion. - Provider allowlists: Optional per-provider model restrictions prevent fallback to incompatible models.
- Diagnostics tracking: Each failed attempt is recorded with error details, reason, status code, and provider/model for debugging.
Implementation sketch:
type FailoverReason =
| "timeout"
| "rate_limit"
| "auth"
| "billing"
| "format"
| "context_overflow";
type ModelCandidate = {
provider: string;
model: string;
};
async function runWithModelFallback<T>(params: {
candidates: ModelCandidate[];
run: (provider: string, model: string) => Promise<T>;
}): Promise<{ result: T; provider: string; model: string; attempts: Attempt[] }> {
const attempts: Attempt[] = [];
for (const candidate of params.candidates) {
try {
const result = await params.run(candidate.provider, candidate.model);
return { result, provider: candidate.provider, model: candidate.model, attempts };
} catch (err) {
const reason = classifyFailoverReason(err);
if (reason === "auth" || reason === "billing") {
throw err; // Retry won't help
}
if (isUserAbort(err)) {
throw err; // User canceled; don't fallback
}
attempts.push({ provider: candidate.provider, model: candidate.model, error: err, reason });
// Continue to next candidate
}
}
throw new Error(`All models failed: ${attempts.map(a => a.error).join(" | ")}`);
}
function classifyFailoverReason(err: unknown): FailoverReason | null {
const status = getStatusCode(err);
if (status === 402) return "billing";
if (status === 429) return "rate_limit";
if (status === 401 || status === 403) return "auth";
if (status === 408) return "timeout";
const message = getErrorMessage(err).toLowerCase();
if (message.includes("timeout") || message.includes("timed out")) return "timeout";
if (message.includes("rate limit") || message.includes("too many requests")) return "rate_limit";
if (message.includes("context window") || message.includes("context length")) return "context_overflow";
return null;
}
User abort vs. timeout distinction:
function isUserAbort(err: unknown): boolean {
// Only treat explicit AbortError names as user aborts
// Message-based checks (e.g., "aborted") can mask timeouts
if (!err || typeof err !== "object") return false;
const name = "name" in err ? String(err.name) : "";
return name === "AbortError" && !isTimeoutError(err);
}
Configuration example:
agents:
defaults:
model:
primary: "anthropic/claude-sonnet-4-20250514"
fallbacks:
- "openai/gpt-4o"
- "google/gemini-2.0-flash"
How to use it
- Define fallback chains: Specify ordered list of alternative models per use case (coding vs. general chat).
- Configure allowlists: Restrict fallback to models that support your request format (e.g., image models only).
- Classify errors: Map provider-specific error codes to semantic reasons for consistent handling.
- Track attempts: Log each fallback attempt with provider, model, error, and reason for observability.
- Handle exhaustion: When all candidates fail, aggregate error messages to provide actionable feedback.
Pitfalls to avoid:
- Over-fallback: Too many fallback chains can cascade failures across providers. Use exponential backoff.
- Semantic mismatch: Fallback models may have different capabilities (vision, tools). Filter by required features.
- Silent failures: Some errors (
format) indicate request incompatibility. Fallback may fail identically.
Trade-offs
Pros:
- Resilience: Transient failures (timeouts, rate limits) don't block the agent.
- Cost optimization: Fallback to cheaper models when premium models are unavailable.
- Clear diagnostics: Attempt history shows which models failed and why.
- User abort respect: Distinguishes user cancellation from timeout, avoiding unnecessary fallbacks.
Cons/Considerations:
- Latency penalty: Each failed attempt adds round-trip time before success.
- Inconsistent outputs: Different models may respond differently, affecting downstream parsing.
- Cost accumulation: Fallback chains may incur multiple API charges for a single logical request.
- Complex configuration: Managing allowlists, chains, and provider-specific behavior adds operational overhead.
References
- Clawdbot model-fallback.ts - Fallback orchestration
- Clawdbot failover-error.ts - Error classification
- Clawdbot error helpers - Reason classification logic
- Related: Extended Coherence Work Sessions for reliability patterns