RAG-Powered Recommendations for E-Commerce — AI Case Study

Context

The Problem

A growing online retailer with 380K monthly active users was running a static rules-based recommendation engine. The system relied on product affinity rules maintained manually by a small merchandising team, and could not adapt to real-time behavioral signals or seasonal trends without manual updates.

Conversion rates from recommendations had plateaued as competitors deployed more sophisticated ML-driven systems. The internal engineering team lacked the specific AI expertise to bridge the gap between model prototypes and production serving at < 50ms.

Key Constraints

Recommendation API must respond in < 50ms at P99 under peak load

Must support their growing product catalog with nuanced cross-category recommendations

User behavior context window must include session history, not just historical purchases

Explainability required: merchandising team must understand why items are recommended

Architecture

The Solution

The solution was a lightweight RAG recommendation system built on LangGraph. The retrieval stage uses a Pinecone vector database indexed with product embeddings (generated from product descriptions and attributes) to fetch candidate sets in under 5ms.

A LangGraph agent then reasons over the candidate set, the user's real-time session context, and a set of business rules (margin targets, exclusions) to produce a ranked, explainable recommendation list. The agent produces a reasoning trace, giving the merchandising team full visibility into every recommendation decision.

Technical Architecture

database

Pinecone Vector Index

Product embeddings generated by text-embedding-3-small. ANN retrieval of candidates in < 5ms.

psychology

LangGraph Reasoning Agent

Stateful agent reasoning over candidates, session context, and business rules. Generates ranked recommendations with citations.

timeline

Real-Time Session Context

Streamed clickstream events building a rolling session vector for each active user.

rocket_launch

FastAPI Serving Layer

Async FastAPI endpoints with Redis caching. P99 < 18ms.

"The 22% increase in basket size exceeded initial projections. The merchandising team also adopted the system quickly, citing improved transparency into how recommendations were generated."