Unified Chat API — Nikhil Kumar

Every LLM provider has a different SDK, a different streaming format, a different error model. Switching from OpenAI to Anthropic mid-project meant rewriting the entire integration layer. This platform makes provider switching a one-line config change.

The Problem Space

Teams integrating LLMs face a recurring problem: each provider — OpenAI, Anthropic, Gemini, xAI, Fireworks — has its own SDK, authentication model, streaming format, and error taxonomy. Switching providers or supporting multiple simultaneously means maintaining parallel integration code. The frontend has to know which provider is active, which breaks the abstraction.

Engineering the Solution

I built a modular plugin architecture where each provider implements a standard interface. Request and response normalization happens at the gateway layer, so the frontend sees a single schema regardless of which provider generates the response. I used Next.js 15 Server Actions for streaming because they give you server-side execution with client-side streaming semantics without a separate API server.

A Next.js 15 TypeScript application with a provider plugin system supporting OpenAI, Anthropic, xAI/Grok, and Fireworks. Auth.js handles multi-user sessions with Neon Serverless Postgres. Vercel Blob stores conversation artifacts. The API normalizes all provider responses into a single JSON schema and supports dynamic model selection based on task type and cost constraints.

Impact & Outcomes

Reduced provider onboarding effort by roughly 70% — adding a new LLM provider requires implementing one adapter interface instead of touching frontend, backend, and streaming code. Supports concurrent multi-session streaming with token-progressive delivery.

Reflections & Takeaways

Key observations from building this system:

Provider normalization is harder than it looks. Each LLM handles system prompts, tool calls, and streaming chunks differently — the adapter layer needs to absorb all that variation.
Server Actions are a clean alternative to separate API routes for streaming, but they require careful error boundary design because errors propagate differently than in traditional REST.
Dynamic model selection based on cost and capability constraints is more useful than simple provider switching.

In hindsight: I would add provider health checks and automatic failover. Currently, switching providers is manual — it should be automatic when a provider starts returning errors.

Next iteration: Building a cost estimation layer that recommends providers based on prompt complexity and budget constraints.