Hello, Spring AI!

What you will build: A working HR Q&A endpoint for TechCorp's SmartHR Assistant. Sarah the HR Manager types a question. Llama answers. No OpenAI account needed.


The Problem We Are Solving

It is Monday morning at TechCorp. Sarah, the HR Manager, has 27 unread Slack messages waiting:

These are the same questions. Every week. Sarah spends two hours every Monday answering them instead of doing actual HR work.

Dev (that's us — the Java developer on the team) gets a Jira ticket:

SMARTHR-001: Build an AI assistant that can answer common HR questions so Sarah can stop copy-pasting the same answers every Monday.

This chapter builds the foundation. By the end, we will have a Spring Boot app that accepts an HR question and returns an intelligent answer — powered entirely by a local Llama model.


What Is Spring AI?

Spring AI is Spring's official abstraction layer for AI models. It does for AI what Spring Data did for databases — it gives you a consistent, framework-native API so you can:

Your Spring Boot App
        │
        ▼
   Spring AI API       ← one consistent interface
        │
   ┌────┴────┐
   │         │
Ollama    OpenAI      ← swap providers via config
(Llama)   (GPT-4)

We will use Ollama because it runs Llama entirely on your laptop. Free. Private. No API key.


How Ollama Works

Ollama is a tool that downloads and runs open-source LLMs locally. Think of it as Docker for AI models.

┌─────────────────────┐
│   Your Laptop        │
│                     │
│  Spring Boot App    │
│        │            │
│        │ HTTP       │
│        ▼            │
│  Ollama Server      │  ← runs on localhost:11434
│  (llama3.2 model)   │
└─────────────────────┘

Spring AI talks to Ollama over HTTP — the same way your app might call any REST API.


Setting Up Ollama

Step 1 — Install Ollama

OS Command
macOS brew install ollama
Linux curl -fsSL https://ollama.ai/install.sh \| sh
Windows Download installer from https://ollama.ai

Step 2 — Download the Llama Model

ollama pull llama3.2

This downloads the Llama 3.2 model (~2GB). It only happens once.

RAM requirements: - llama3.2 (3B) → needs ~4GB RAM - llama3.2:8b → needs ~8GB RAM - llama3.1:70b → needs ~48GB RAM (skip this one for now)

Step 3 — Start Ollama

ollama serve

Verify it is working:

curl -s http://localhost:11434/api/tags

You should see llama3.2 in the list.


Project Structure

code/chapter-01-hello-spring-ai/
├── pom.xml
└── src/main/
    ├── java/com/techcorp/smarthr/
    │   ├── SmartHrAssistantApplication.java     ← entry point
    │   ├── controller/
    │   │   └── HrChatController.java            ← REST endpoints
    │   └── model/
    │       ├── HrRequest.java                   ← request body
    │       └── HrResponse.java                  ← response body
    └── resources/
        └── application.yml                      ← Ollama config

The Code

1. Maven Dependencies (pom.xml)

The only Spring AI dependency we need for Chapter 1 is the Ollama starter:

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>4.1.0-RC1</version>
    <relativePath/>
</parent>

<properties>
    <java.version>21</java.version>
    <spring-ai.version>2.0.0-M6</spring-ai.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webmvc</artifactId>
    </dependency>

    <!-- This one line connects Spring AI to Ollama -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-ollama</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webmvc-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

The BOM (Bill of Materials) manages Spring AI version compatibility across all its modules. Note the artifact name changed in Spring AI 2.x — it is now spring-ai-starter-model-ollama instead of the older spring-ai-ollama-spring-boot-starter.

2. Configuration (application.yml)

spring:
  application:
    name: SmartHR Assistant

  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3.2 # or phi3:mini, gemma2, mistral
          temperature: 0.3

server:
  port: 8080

What is temperature?

Temperature controls how creative (or random) the model's response is:

Temperature Behaviour Good for
0.0 Deterministic — same input always gives same output Factual lookups
0.3 Slightly varied — mostly consistent HR policies, Q&A
0.7 Creative — varied responses Writing, brainstorming
1.0 Very random Creative fiction

We use 0.3 for HR — we want consistent, professional answers.

3. The Request and Response Models

// HrRequest.java
public record HrRequest(String question) {}

// HrResponse.java
public record HrResponse(String question, String answer) {}

Java records are a clean fit here — immutable, no boilerplate, auto-generated constructors and getters.

4. The Controller — where the magic happens

@RestController
@RequestMapping("/hr")
public class HrChatController {

    private static final String SYSTEM_PROMPT = """
            You are an HR assistant for TechCorp, a mid-sized technology company.
            Your job is to answer employee questions about HR policies, benefits,
            leave, onboarding, and workplace guidelines clearly and professionally.
            Keep answers concise and factual. If you do not know the answer,
            say so honestly and suggest contacting the HR department directly.
            """;

    private final ChatClient chatClient;

    public HrChatController(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultSystem(SYSTEM_PROMPT)
                .build();
    }

    @PostMapping("/ask")
    public HrResponse ask(@RequestBody HrRequest request) {
        String answer = chatClient
                .prompt()
                .user(request.question())
                .call()
                .content();
        return new HrResponse(request.question(), answer);
    }
}

Understanding the Code — Three Key Concepts

Concept 1: The System Prompt

private static final String SYSTEM_PROMPT = """
        You are an HR assistant for TechCorp...
        """;

A system prompt is a set of instructions you give the model before the conversation starts. It defines: - Who the model is (HR assistant, not a general chatbot) - What it should and shouldn't do - Tone and style (professional, concise)

Without a system prompt, Llama would answer anything. With one, it stays in character as an HR assistant.

Think of it as the job description you hand to a new employee on their first day.

Concept 2: ChatClient and ChatClient.Builder

Spring AI auto-configures a ChatClient.Builder bean. You inject the builder, not the client directly, because the builder lets you set defaults:

// ChatClient.Builder is auto-configured by Spring AI — just inject it
public HrChatController(ChatClient.Builder builder) {
    this.chatClient = builder
            .defaultSystem(SYSTEM_PROMPT)   // applied to every call
            .build();
}

defaultSystem() means every request to this ChatClient will automatically include the system prompt. You set it once, use it everywhere.

Concept 3: The Fluent Prompt API

chatClient
    .prompt()               // start building a prompt
    .user(question)         // set the user's message
    .call()                 // send to Llama, wait for response
    .content();             // extract the response text

This is Spring AI's fluent API. Each method call adds to the prompt or processes the response:

.prompt()   →  creates a PromptSpec
.user()     →  adds the user message
.call()     →  sends the HTTP request to Ollama
.content()  →  extracts the String from the response

Run the Application

Step 1 — Start Ollama

ollama serve

Step 2 — Run the Spring Boot App

cd code/chapter-01-hello-spring-ai
mvn spring-boot:run

Step 3 — Ask Sarah's Monday Questions

Question 1: Vacation policy

curl -s -X POST http://localhost:8080/hr/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "How many vacation days do new employees get?"}'
{
  "question": "How many vacation days do new employees get?",
  "answer": "At TechCorp, new employees typically receive 15 days of paid vacation per year during their first year of employment."
}

Quick browser test (GET endpoint)

http://localhost:8080/hr/ask?question=When+does+health+insurance+start+for+new+hires?

What Just Happened?

Sarah types: "How many vacation days do I get?"
     │
     ▼
POST /hr/ask  {"question": "..."}
     │
     ▼
HrChatController.ask()
     ├── System prompt: "You are an HR assistant for TechCorp..."
     └── User message:  "How many vacation days do I get?"
     │
     ▼
ChatClient → Spring AI → Ollama HTTP API → Llama 3.2
     │
     ▼
HrResponse { question: "...", answer: "..." }
     │
     ▼
Sarah reads the answer. No Slack message needed.

Common Errors and Fixes

Error Cause Fix
Connection refused localhost:11434 Ollama not running Run ollama serve
model not found Model not downloaded Run ollama pull llama3.2
Port 8080 already in use Another app on 8080 Set server.port=8081
Response is very slow Model too large for RAM Switch to phi3:mini

Summary

In this chapter you:

Sarah can now point employees to the HR chatbot. Her Monday mornings just got better.


Where This Chatbot Falls Short

The assistant works — but take a close look at that first response:

{
  "answer": "At TechCorp, new employees typically receive 15 days of paid vacation per year during their first year of employment."
}

That number is invented. Llama has never seen TechCorp's employee handbook. It answered based on patterns from its public training data — millions of generic HR articles scraped from the internet. It sounded confident and professional, but it could be completely wrong for TechCorp.

This is the core problem with the Chapter 1 approach:

Shortcoming What happens Real impact
Responses from public data The model answers from general internet knowledge, not TechCorp's actual policies Employees get plausible-sounding but potentially wrong information
Hallucinated specifics Policy numbers, dates, and thresholds are guessed "15 days" — TechCorp may actually give 20
No TechCorp context The model knows nothing about TechCorp's org chart, systems, or internal processes Questions like "Who approves my leave?" get generic answers
No memory between calls Each question is independent — the model forgets previous exchanges Users cannot ask follow-up questions naturally
No audit trail There is no record of what the model told employees HR cannot verify or correct bad advice

A Concrete Example

Ask the chatbot something TechCorp-specific:

curl -s -X POST http://localhost:8080/hr/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is TechCorp'\''s parental leave policy?"}'

The model will answer confidently. It might say 12 weeks. It might say 16 weeks. Whatever it says, it is a guess — because the system prompt tells Llama it is an HR assistant for TechCorp, but it has never actually read TechCorp's parental leave policy document.

The system prompt shapes tone and persona. It does not give the model knowledge it does not have.

The Fix Coming in Later Chapters

The solution is Retrieval-Augmented Generation (RAG): instead of asking the model to remember policies, you retrieve the relevant policy text from a document store and inject it directly into the prompt at query time.

Employee question → Search TechCorp policy docs → Inject matching text → Ask Llama

That is Chapter 4. But before we get there, Chapter 2 covers how the model processes text (tokens) and how to control its response behaviour precisely — both of which you will need to build RAG well.


What's Next

In Chapter 2, we go under the hood — learning how tokens control response length, how Spring AI's message architecture works, and how to tune the model per request with ChatOptions.

Code for this chapter: code/chapter-01-hello-spring-ai/