GuidesRAG Pipeline
RAG Pipeline
Build a complete retrieval-augmented generation pipeline
Overview
This guide walks through building a production RAG pipeline using the knowledge, chain, ai, and database modules together.
Architecture
Document → Parse → Chunk → Embed → Store → Query → Retrieve → Generate
Step 1: Parse and Chunk Documents
import { parseDocument, chunk } from '@jamaalbuilds/ai-toolkit/knowledge';
const doc = await parseDocument('./company-handbook.pdf');
const chunks = await chunk(doc.content, {
chunkSize: 512,
chunkOverlap: 50,
});
// chunks.length — number of chunks created
Step 2: Create Embeddings and Store
import { createKnowledge, ingest } from '@jamaalbuilds/ai-toolkit/knowledge';
import { createDatabase, vectorSearchRaw } from '@jamaalbuilds/ai-toolkit/database';
const db = await createDatabase({
connectionString: process.env.DATABASE_URL!,
});
const knowledge = createKnowledge({
embedder: async (texts) => {
// Use your embedding provider
return texts.map(() => new Array(1536).fill(0)); // placeholder
},
store: {
async upsert(chunks) {
// Store in your vector DB
},
async search(vector, options) {
const rows = await vectorSearchRaw(db, {
table: 'documents',
column: 'embedding',
queryVector: vector,
limit: options?.limit ?? 5,
metric: 'cosine',
});
return rows.map((r) => ({
chunk: { content: r.data.content, metadata: r.data },
similarity: r.similarity,
}));
},
},
});
await knowledge.ingest('./handbook.pdf', {
metadata: { source: 'handbook', version: '2024' },
});
Step 3: Query with RAG
import { rag } from '@jamaalbuilds/ai-toolkit/chain';
import { createAI } from '@jamaalbuilds/ai-toolkit/ai';
const ai = createAI();
const ragChain = rag({
retriever: {
retrieve: async (query) => {
const results = await knowledge.search(query, { limit: 5 });
return results.map((r) => ({ content: r.chunk.content, metadata: r.chunk.metadata }));
},
},
promptTemplate: `You are a helpful assistant. Answer based on the provided context.
Context:
{context}
Question: {question}
Answer:`,
model: async (prompt) => {
const result = await ai.generate(prompt);
return result.text;
},
});
const result = await ragChain.invoke({ question: 'What is the vacation policy?' });
// result.answer — the generated response
// result.sources — the retrieved documents
Step 4: Add Monitoring
import { createMonitor, trace } from '@jamaalbuilds/ai-toolkit/monitor';
const monitor = await createMonitor();
const query = 'What is the vacation policy?';
const { result } = await trace(monitor, 'rag-query', async (span) => {
span.update({ input: query });
const answer = await ragChain.invoke({ question: query });
span.update({ output: answer });
return answer;
});
Best Practices
- Chunk size: 256-512 tokens works well for most use cases
- Overlap: 10-20% overlap prevents losing context at boundaries
- Metadata: Tag chunks with source, page, section for filtering
- Reranking: Retrieve more candidates (20+) then rerank to top 5
- Monitoring: Trace every query to evaluate retrieval quality