Building RAG Applications with Spring Boot and Spring AI
Retrieval-Augmented Generation (RAG) is a powerful technique that combines information retrieval with large language models (LLMs) to provide accurate, context-aware responses based on your own data. Spring AI makes it easy to build RAG applications in Java.
What is RAG?
RAG enhances LLM responses by: 1. Retrieving relevant information from a knowledge base 2. Augmenting the LLM prompt with retrieved context 3. Generating accurate responses based on your data
This approach reduces hallucinations and provides answers grounded in your specific documents.
Prerequisites
- Java 17 or higher
- Maven or Gradle
- Spring Boot 3.2+
- OpenAI API key (or other LLM provider)
Project Setup
1. Create Spring Boot Project
Use start.spring.io with these dependencies: - Spring Web - Spring AI OpenAI - Spring AI Vector Store (PGVector, Chroma, or Pinecone)
2. Add Dependencies (Maven)
<dependencies>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Vector Store - Choose one -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<!-- Document Readers -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
</dependencies>
3. Configuration
# application.properties
# OpenAI Configuration
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4
spring.ai.openai.chat.options.temperature=0.7
# Embedding Model
spring.ai.openai.embedding.options.model=text-embedding-ada-002
# Vector Store (PGVector example)
spring.datasource.url=jdbc:postgresql://localhost:5432/vectordb
spring.datasource.username=postgres
spring.datasource.password=password
spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.dimensions=1536
Building a RAG Application
1. Document Loader Service
package com.example.rag.service;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.pdf.PagePdfDocumentReader;
import org.springframework.ai.reader.tika.TikaDocumentReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
public class DocumentLoaderService {
@Autowired
private VectorStore vectorStore;
public void loadPdfDocument(Resource pdfResource) {
// Read PDF document
PagePdfDocumentReader pdfReader = new PagePdfDocumentReader(pdfResource);
List<Document> documents = pdfReader.get();
// Split documents into chunks
TokenTextSplitter splitter = new TokenTextSplitter();
List<Document> chunks = splitter.apply(documents);
// Store in vector database
vectorStore.add(chunks);
}
public void loadTextDocument(Resource resource) {
TikaDocumentReader reader = new TikaDocumentReader(resource);
List<Document> documents = reader.get();
TokenTextSplitter splitter = new TokenTextSplitter(
800, // chunk size
200 // overlap
);
List<Document> chunks = splitter.apply(documents);
vectorStore.add(chunks);
}
public void loadMultipleDocuments(List<Resource> resources) {
resources.forEach(this::loadTextDocument);
}
}
2. RAG Service
package com.example.rag.service;
import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.chat.ChatResponse;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.SystemPromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
@Service
public class RagService {
@Autowired
private ChatClient chatClient;
@Autowired
private VectorStore vectorStore;
private static final String SYSTEM_PROMPT = """
You are a helpful assistant that answers questions based on the provided context.
Use only the information from the context to answer the question.
If the answer cannot be found in the context, say "I don't have enough information to answer that."
Context:
{context}
""";
public String query(String question) {
// 1. Retrieve relevant documents
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.query(question).withTopK(5)
);
// 2. Build context from retrieved documents
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
// 3. Create prompt with context
SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(SYSTEM_PROMPT);
Message systemMessage = systemPromptTemplate.createMessage(Map.of("context", context));
UserMessage userMessage = new UserMessage(question);
// 4. Generate response
Prompt prompt = new Prompt(List.of(systemMessage, userMessage));
ChatResponse response = chatClient.call(prompt);
return response.getResult().getOutput().getContent();
}
public RagResponse queryWithSources(String question) {
// Retrieve relevant documents
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.query(question).withTopK(5)
);
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
// Generate response
SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(SYSTEM_PROMPT);
Message systemMessage = systemPromptTemplate.createMessage(Map.of("context", context));
UserMessage userMessage = new UserMessage(question);
Prompt prompt = new Prompt(List.of(systemMessage, userMessage));
ChatResponse response = chatClient.call(prompt);
String answer = response.getResult().getOutput().getContent();
// Extract sources
List<String> sources = relevantDocs.stream()
.map(doc -> doc.getMetadata().get("source"))
.map(Object::toString)
.distinct()
.collect(Collectors.toList());
return new RagResponse(answer, sources);
}
}
3. Response Model
package com.example.rag.service;
import java.util.List;
public record RagResponse(String answer, List<String> sources) {}
4. REST Controller
package com.example.rag.controller;
import com.example.rag.service.DocumentLoaderService;
import com.example.rag.service.RagResponse;
import com.example.rag.service.RagService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.Resource;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
@RestController
@RequestMapping("/api/rag")
public class RagController {
@Autowired
private RagService ragService;
@Autowired
private DocumentLoaderService documentLoaderService;
@PostMapping("/upload")
public ResponseEntity<String> uploadDocument(@RequestParam("file") MultipartFile file) {
try {
Resource resource = file.getResource();
documentLoaderService.loadTextDocument(resource);
return ResponseEntity.ok("Document uploaded and processed successfully");
} catch (IOException e) {
return ResponseEntity.badRequest().body("Error processing document: " + e.getMessage());
}
}
@PostMapping("/query")
public ResponseEntity<String> query(@RequestBody QueryRequest request) {
String answer = ragService.query(request.question());
return ResponseEntity.ok(answer);
}
@PostMapping("/query-with-sources")
public ResponseEntity<RagResponse> queryWithSources(@RequestBody QueryRequest request) {
RagResponse response = ragService.queryWithSources(request.question());
return ResponseEntity.ok(response);
}
}
record QueryRequest(String question) {}
Advanced Features
1. Custom Embedding Model
@Configuration
public class EmbeddingConfig {
@Bean
public EmbeddingClient embeddingClient() {
return new OpenAiEmbeddingClient(
new OpenAiApi(apiKey),
MetadataMode.EMBED,
OpenAiEmbeddingOptions.builder()
.withModel("text-embedding-3-large")
.build()
);
}
}
2. Metadata Filtering
public String queryWithFilter(String question, String category) {
SearchRequest searchRequest = SearchRequest.query(question)
.withTopK(5)
.withSimilarityThreshold(0.7)
.withFilterExpression("category == '" + category + "'");
List<Document> relevantDocs = vectorStore.similaritySearch(searchRequest);
// ... rest of the implementation
}
3. Hybrid Search (Keyword + Semantic)
public List<Document> hybridSearch(String query) {
// Semantic search
List<Document> semanticResults = vectorStore.similaritySearch(
SearchRequest.query(query).withTopK(10)
);
// Keyword search (implement based on your vector store)
List<Document> keywordResults = performKeywordSearch(query);
// Combine and re-rank results
return combineAndRerank(semanticResults, keywordResults);
}
4. Conversation Memory
@Service
public class ConversationalRagService {
@Autowired
private ChatClient chatClient;
@Autowired
private VectorStore vectorStore;
private final Map<String, List<Message>> conversationHistory = new ConcurrentHashMap<>();
public String queryWithHistory(String sessionId, String question) {
List<Document> relevantDocs = vectorStore.similaritySearch(
SearchRequest.query(question).withTopK(5)
);
String context = relevantDocs.stream()
.map(Document::getContent)
.collect(Collectors.joining("\n\n"));
List<Message> messages = conversationHistory.getOrDefault(sessionId, new ArrayList<>());
if (messages.isEmpty()) {
messages.add(new SystemMessage("Context: " + context));
}
messages.add(new UserMessage(question));
Prompt prompt = new Prompt(messages);
ChatResponse response = chatClient.call(prompt);
String answer = response.getResult().getOutput().getContent();
messages.add(new AssistantMessage(answer));
conversationHistory.put(sessionId, messages);
return answer;
}
}
Testing the RAG Application
1. Upload Documents
curl -X POST http://localhost:8080/api/rag/upload \
-F "[email protected]"
2. Query the System
curl -X POST http://localhost:8080/api/rag/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the document?"}'
3. Query with Sources
curl -X POST http://localhost:8080/api/rag/query-with-sources \
-H "Content-Type: application/json" \
-d '{"question": "Explain the key concepts"}'
Best Practices
- Chunk Size: Experiment with chunk sizes (500-1000 tokens typically work well)
- Overlap: Use 10-20% overlap between chunks to maintain context
- Top-K: Start with 3-5 relevant documents, adjust based on results
- Similarity Threshold: Filter out low-relevance documents (0.7+ similarity)
- Prompt Engineering: Craft clear system prompts that guide the model
- Metadata: Add rich metadata to documents for better filtering
- Caching: Cache embeddings to reduce API calls
- Error Handling: Implement robust error handling for API failures
Vector Store Options
PGVector (PostgreSQL)
- Best for: Production applications with existing PostgreSQL
- Pros: ACID compliance, mature ecosystem
- Cons: Requires PostgreSQL setup
Chroma
- Best for: Development and prototyping
- Pros: Easy setup, no external dependencies
- Cons: Less scalable for production
Pinecone
- Best for: Large-scale production applications
- Pros: Fully managed, highly scalable
- Cons: Requires external service, cost
Weaviate
- Best for: Complex semantic search requirements
- Pros: Advanced features, GraphQL API
- Cons: More complex setup
Common Use Cases
- Customer Support: Answer questions from documentation
- Knowledge Management: Search internal company documents
- Research Assistant: Query academic papers and reports
- Code Documentation: Search and explain codebases
- Legal Document Analysis: Query contracts and legal texts
Troubleshooting
Low-Quality Responses
- Increase chunk overlap
- Adjust similarity threshold
- Improve document preprocessing
- Use better embedding models
Slow Performance
- Implement caching
- Optimize chunk size
- Use batch processing for uploads
- Consider vector store indexing
High Costs
- Cache embeddings
- Use smaller embedding models
- Implement rate limiting
- Optimize chunk sizes
- Go back to Frameworks
- Return to Home