Chat Memory: Multi-Turn Conversations


The Problem We Are Solving

Raj uses the SmartHR bot on his first day and asks:

"What tools do I need to set up?"

The bot answers. Then he asks:

"And what about the security training?"

The bot responds as if it is a brand new conversation — no context, no memory of what was just discussed. Raj has to repeat himself every message. He tells Sarah: "This bot has goldfish memory."

The fix is ChatMemory.


What You Will Learn


Why AI Calls Are Stateless by Default

Each call to Llama is independent. Llama does not remember your previous message. Spring AI sends exactly what you put in the Prompt — nothing more.

To simulate memory, you append the conversation history to every new request:

Turn 1 — You send:
  System: "You are an HR assistant..."
  User:   "What tools do I need?"

Turn 2 — You send:
  System:    "You are an HR assistant..."
  User:      "What tools do I need?"       ← repeated from Turn 1
  Assistant: "You will need Slack, Jira..."  ← Turn 1 response
  User:      "And what about security training?"  ← new question

InMemoryChatMemory manages this history for you automatically.


How InMemoryChatMemory Works

ChatMemory memory = new InMemoryChatMemory();

ChatClient chatClient = ChatClient.builder(chatModel)
        .defaultSystem(SYSTEM_PROMPT)
        .defaultAdvisors(new MessageChatMemoryAdvisor(memory))
        .build();

The MessageChatMemoryAdvisor intercepts every call and: 1. Loads previous messages from memory for this conversation ID 2. Appends them to the prompt before sending to Llama 3. Saves the new user + assistant messages back to memory


What You Will Build — Stateful Onboarding Chatbot

// POST /hr/onboard/chat
public record OnboardRequest(String sessionId, String message) {}

@PostMapping("/onboard/chat")
public HrResponse chat(@RequestBody OnboardRequest request) {
    String answer = chatClient
            .prompt()
            .user(request.message())
            .advisors(a -> a.param(CHAT_MEMORY_CONVERSATION_ID_KEY, request.sessionId()))
            .call()
            .content();
    return new HrResponse(request.message(), answer, "onboard");
}

// DELETE /hr/onboard/chat/{sessionId} — clear memory for a session
@DeleteMapping("/onboard/chat/{sessionId}")
public void clearSession(@PathVariable String sessionId) {
    memory.clear(sessionId);
}

Test a multi-turn conversation:

# Turn 1
curl -s -X POST http://localhost:8080/hr/onboard/chat \
  -d '{"sessionId": "raj-001", "message": "What laptop should I request?"}'

# Turn 2 — bot remembers the laptop context
curl -s -X POST http://localhost:8080/hr/onboard/chat \
  -d '{"sessionId": "raj-001", "message": "And what software comes pre-installed?"}'

# Turn 3 — still in context
curl -s -X POST http://localhost:8080/hr/onboard/chat \
  -d '{"sessionId": "raj-001", "message": "How long does setup usually take?"}'

Memory Limits

InMemoryChatMemory keeps history in the JVM heap. For production:

Concern Solution
Memory grows unbounded Limit history with MessageWindowChatMemory
History lost on restart Use a persistent store (Redis, database)
Multiple instances Shared cache or database-backed memory

Chapter 6 uses in-memory for simplicity. Chapter 17 covers production-ready persistence.


Summary

In this chapter you will:


What's Next

In Chapter 7, we build the most powerful feature yet — RAG (Retrieval Augmented Generation). Sarah uploads TechCorp's policy documents and employees can ask questions that are answered from the actual company policies, not from Llama's general training.


References

Code for this chapter: code/chapter-06-chat-memory/