Running and Comparing Multiple Models with Ollama

The Problem We Are Solving

Sarah has heard that there are other AI models besides Llama. A colleague told her that mistral is faster, and codellama is better at technical questions. She asks Dev:

"Can we try different models without rewriting the whole app? And can we see how they compare?"

The answer is yes — and it only takes one line of config.

What You Will Learn

How Ollama manages multiple models on your machine
How to switch models via application.yml (zero code changes)
How to switch models per-request using Ollama-specific options
How to build a model comparison endpoint
Which models work best for which HR tasks

Ollama Model Management

# Pull models
ollama pull llama3.2       # default — good all-rounder
ollama pull mistral        # fast, strong reasoning
ollama pull codellama      # optimised for code
ollama pull phi3:mini      # tiny — runs on 4GB RAM
ollama pull gemma2         # Google's open model

# List what you have
ollama list

# Remove a model
ollama rm mistral

Switching Models via Config (No Code Changes)

# application.yml — change this one line to switch models
spring:
  ai:
    ollama:
      chat:
        options:
          model: mistral

Restart the app. Every endpoint now uses mistral. No Java changes.

Switching Models Per-Request

// Override the default model for a single call using Ollama-specific options
// (full implementation covered in this chapter)
String answer = chatClient
        .prompt()
        .user(request.question())
        .options(/* Ollama model options with .model("mistral") */)
        .call()
        .content();

What You Will Build — Model Comparison Endpoint

// POST /hr/ask/compare
// Sends the same question to two models and returns both answers

public record CompareRequest(String question, String modelA, String modelB) {}
public record CompareResponse(String question, String answerA, String answerB,
                               String modelA, String modelB) {}

@PostMapping("/ask/compare")
public CompareResponse compare(@RequestBody CompareRequest request) {
    String answerA = askWithModel(request.question(), request.modelA());
    String answerB = askWithModel(request.question(), request.modelB());
    return new CompareResponse(request.question(), answerA, answerB,
                               request.modelA(), request.modelB());
}

private String askWithModel(String question, String model) {
    return chatClient.prompt()
            .user(question)
            .options(/* Ollama-specific options with .model(model) — covered in this chapter */)
            .call()
            .content();
}

Test it:

curl -s -X POST http://localhost:8080/hr/ask/compare \
  -H "Content-Type: application/json" \
  -d '{"question": "What is a good onboarding plan for a new software engineer?",
       "modelA": "llama3.2",
       "modelB": "mistral"}'

Model Cheat Sheet

Model	RAM Required	Best for	Speed
`llama3.2` (3B)	4GB	General Q&A	Fast
`llama3.2:8b`	8GB	Better reasoning	Medium
`mistral`	5GB	Instruction following	Fast
`codellama`	4GB	Code, technical docs	Fast
`phi3:mini`	2GB	Lightweight, constrained env	Very fast
`gemma2`	5GB	Multi-language	Medium

Summary

In this chapter you will:

Manage multiple Ollama models from the command line
Switch models with a single config change
Override the model per-request using Ollama-specific options
Build a side-by-side model comparison endpoint

What's Next

In Chapter 4, we tackle prompt engineering — using PromptTemplate to inject dynamic data (employee name, department, role) into prompts so the HR assistant gives personalised, TechCorp-branded responses.

Code for this chapter: code/chapter-03-comparing-models/

← Chapter 2: Core Concepts: Tokens, Messages, and the AI Abstraction Next: Chapter 4: Prompt Engineering: PromptTemplate and Dynamic Prompts →