Most chatbot demos stop at 'send message, get response.' Lia goes further — every conversation turn is sentiment-scored in real time, streamed through SSE, and persisted with full conversation history. The interesting engineering isn't the chat. It's the infrastructure that makes the chat feel alive.

The Problem Space

Building a chat interface that feels responsive requires more than a POST request and a JSON response. Users expect streaming tokens, conversation memory, and the ability to switch models without losing context. Adding sentiment analysis on top means coordinating parallel AI calls without blocking the response stream.

Engineering the Solution

I separated the streaming protocol from the AI provider layer. The FastAPI backend uses Server-Sent Events over POST — not WebSockets — because SSE gives you the streaming UX without the connection management overhead. The provider adapter pattern means Gemini and OpenAI share a single interface, so switching models is a config change, not a code change. Sentiment runs as a parallel task during streaming, so the user sees tokens arriving while sentiment computation happens in the background.

A FastAPI async backend with JWT auth, conversation persistence in PostgreSQL, Redis-backed caching and rate limiting, and a React 19 frontend with a custom streaming hook that batches token updates at requestAnimationFrame cadence. The sentiment system supports three strategies: structured (combined with the chat response), separate LLM analysis, and Google Cloud NLP. Conversations carry both per-message and cumulative sentiment state.

Impact & Outcomes

A production-oriented chat system deployed on Render with Vercel frontend. Supports multi-turn conversations with full history, real-time provider switching, and sentiment insights that update after each exchange without blocking the response stream.

Reflections & Takeaways

Key observations from building this system:

  • SSE over POST is underrated. You get streaming semantics with standard HTTP, no WebSocket upgrade negotiation, and natural compatibility with load balancers.
  • Parallel sentiment analysis during streaming requires careful task lifecycle management — cancelling orphaned tasks on disconnect prevents resource leaks.
  • requestAnimationFrame batching in the streaming hook eliminates the jank that comes from updating React state on every token arrival.
In hindsight: I would add structured logging from day one. Debugging streaming issues across frontend and backend without correlated request IDs was harder than it needed to be.
Next iteration: Adding conversation branching — the ability to fork a conversation at any message and explore different response paths.