Graph RAG with Neo4j

What you will build: Upgrade the policy Q&A to Neo4j — a graph database where policy sections are connected by relationships, enabling Graph RAG to answer multi-topic questions that flat vector search cannot handle.


The Problem We Are Solving

PgVector from Chapter 8 handles single-topic questions well. But Sarah gets a report from the legal team:

"An employee asked: 'What happens if my parental leave runs out and I'm still unwell?' The bot answered about parental leave but completely missed the sick leave continuation clause. These policies are connected — the bot needs to know that."

This is the limitation of flat vector search. Each chunk is retrieved independently. There is no awareness that "parental leave policy" and "sick leave continuation" are related — and that a question about one should also surface the other.

Graph RAG solves this by storing relationships between policy sections. When Neo4j retrieves the parental leave chunk, it also traverses the graph edge to the sick leave chunk — because we told it they are connected.


What You Will Learn


Plain RAG vs Graph RAG

Plain RAG (Chapter 7 & 8):

  Question ──► embed ──► find top-K similar chunks ──► inject into prompt

  Problem: each chunk is isolated. No awareness of connections.

Graph RAG (Chapter 9):

  Question ──► embed ──► find top-K similar chunks
                               │
                               ▼
                    traverse graph edges
                               │
                               ▼
                    retrieve connected chunks too ──► inject all into prompt

  Benefit: related policy sections surface automatically.

Example:

Question Plain RAG retrieves Graph RAG retrieves
"What if parental leave runs out and I'm still ill?" Parental leave chunk only Parental leave chunk + sick leave chunk (connected by edge)
"What laptop do engineers get and how do I request it?" IT equipment chunk only IT equipment chunk + onboarding chunk (connected)

The Only Code That Changes

Chapter 8 — PgVectorStore:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel, JdbcTemplate jdbcTemplate) {
    return PgVectorStore.builder(jdbcTemplate, embeddingModel)
            .initializeSchema(true)
            .build();
}

Chapter 9 — Neo4jVectorStore:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel, Driver driver) {
    return Neo4jVectorStore.builder(driver, embeddingModel)
            .initializeSchema(true)
            .build();
}
# application.yml
spring:
  neo4j:
    uri: bolt://localhost:7687
    authentication:
      username: neo4j
      password: smarthr
  ai:
    vectorstore:
      neo4j:
        initialize-schema: true
        embedding-dimension: 768
        distance-type: COSINE

The controller, QuestionAnswerAdvisor, TikaDocumentReader, and TokenTextSplitter remain identical to Chapters 7 and 8.


Docker Setup

# docker-compose.yml
services:
  neo4j:
    image: neo4j:5
    environment:
      NEO4J_AUTH: neo4j/smarthr123
      NEO4J_PLUGINS: '["apoc"]'
    ports:
      - "7474:7474"   # Neo4j Browser UI
      - "7687:7687"   # Bolt protocol
    volumes:
      - neo4jdata:/data

volumes:
  neo4jdata:
docker-compose up -d

Open http://localhost:7474 to access the Neo4j Browser — you can visually explore the policy graph, see nodes, and traverse relationships.


How Neo4j Stores Documents

In Neo4j, each policy chunk becomes a node in the graph:

(:Document {
  id: "abc-123",
  text: "Primary caregivers are entitled to 16 weeks of fully paid parental leave.",
  embedding: [0.12, -0.84, 0.33, ...],
  source: "techcorp-hr-policy.txt"
})

Relationships between related policy sections are edges:

(:Document {text: "...parental leave..."})
  -[:RELATED_TO]->
(:Document {text: "...sick leave continuation..."})

Spring AI's Neo4jVectorStore handles the vector similarity search with Cypher:

MATCH (doc:Document)
WITH doc, vector.similarity.cosine(doc.embedding, $queryEmbedding) AS score
WHERE score > 0.7
RETURN doc ORDER BY score DESC LIMIT 4

Modelling Policy Relationships

The RagConfig adds relationships between related policy sections after ingestion:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel, Driver driver) {
    Neo4jVectorStore store = Neo4jVectorStore.builder(driver, embeddingModel)
            .initializeSchema(true)
            .build();
    ingestWithRelationships(store, driver);
    return store;
}

private void ingestWithRelationships(Neo4jVectorStore store, Driver driver) {
    // Ingest chunks
    List<Document> chunks = new TokenTextSplitter()
            .apply(new TikaDocumentReader(policyResource).read());
    store.add(chunks);

    // Add relationships between related policy areas
    try (Session session = driver.session()) {
        session.run("""
            MATCH (a:Document), (b:Document)
            WHERE a.text CONTAINS 'parental leave'
              AND b.text CONTAINS 'sick leave'
            MERGE (a)-[:RELATED_TO]->(b)
            """);
        session.run("""
            MATCH (a:Document), (b:Document)
            WHERE a.text CONTAINS 'onboarding'
              AND b.text CONTAINS 'IT equipment'
            MERGE (a)-[:RELATED_TO]->(b)
            """);
    }
}

Comparing All Three Vector Stores

Concern SimpleVectorStore PgVectorStore Neo4jVectorStore
Storage JVM heap PostgreSQL Neo4j graph
Persistence No Yes Yes
Search O(N) brute-force O(log N) HNSW O(log N) + graph traversal
Relationships None None First-class
Multi-topic queries Poor Poor Excellent
Visual exploration No psql only Neo4j Browser UI
Setup Zero Docker Docker

What You Will Build

Same two endpoints as Chapters 7 and 8 — the API does not change:

POST /hr/policy/ask     — ask a question; Graph RAG retrieves connected chunks
POST /hr/policy/ingest  — ingest new policy text at runtime

Test a multi-topic question:

curl -s -X POST http://localhost:8080/hr/policy/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What happens if my parental leave runs out and I am still unwell?"}'

With plain RAG this returns only the parental leave policy. With Graph RAG it returns parental leave and sick leave continuation — because the graph knows they are connected.


Summary

In this chapter you will:


What's Next

In Chapter 10, we add Function Calling — the SmartHR bot will call actual Java methods mid-conversation to look up real employee data, check calendar availability, and perform actions, not just answer questions.

Code for this chapter: code/chapter-09-neo4j/