Building a RAG System with Pinecone and LangChain [2025]
![Building a RAG System with Pinecone and LangChain [2025]](/blog/images/building-rag-system.jpg)
Building a RAG System with Pinecone and LangChain
Retrieval-Augmented Generation (RAG) is transforming how we build AI applications. By combining the reasoning capabilities of Large Language Models with precise information retrieval from vector databases, RAG systems provide accurate, contextual responses grounded in your own data. This comprehensive guide shows you how to build a production-ready RAG system using Pinecone, LangChain, and n8n.
Understanding RAG Architecture
RAG systems solve a fundamental LLM problem: hallucinations. Instead of relying solely on the model's training data, RAG retrieves relevant information from your knowledge base and provides it as context.
How RAG Works
┌─────────────────────────────────────────────────────────────┐
│ RAG System Flow │
│ │
│ 1. User Query │
│ "What are our Q4 revenue projections?" │
│ │ │
│ ▼ │
│ 2. Query Embedding │
│ [0.123, -0.456, 0.789, ...] (1536 dimensions) │
│ │ │
│ ▼ │
│ 3. Vector Search (Pinecone) │
│ ┌──────────────────────────────────────────┐ │
│ │ • Financial Report Q4 (similarity: 0.92) │ │
│ │ • Revenue Forecast Doc (similarity: 0.88)│ │
│ │ • Board Meeting Notes (similarity: 0.85) │ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. Context Assembly │
│ Relevant passages + user query │
│ │ │
│ ▼ │
│ 5. LLM Generation (GPT-4/Claude) │
│ "Based on our Q4 financial report, revenue │
│ projections are $X million, representing a Y% │
│ increase from Q3..." │
└─────────────────────────────────────────────────────────────┘
Key Components
- Vector Database (Pinecone): Stores document embeddings for fast similarity search
- Embedding Model: Converts text to vector representations
- LLM (GPT-4/Claude): Generates responses using retrieved context
- Orchestration (LangChain): Connects components and manages workflow
- Automation (n8n): Handles document ingestion and updates
Setting Up Pinecone
Create Pinecone Index
import pinecone
# Initialize Pinecone
pinecone.init(
api_key="your-api-key",
environment="us-west1-gcp"
)
# Create index for OpenAI embeddings (1536 dimensions)
index_name = "knowledge-base"
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=1536, # OpenAI ada-002 embedding size
metric="cosine", # or "euclidean" or "dotproduct"
pods=1,
pod_type="p1.x1", # Standard pod
metadata_config={
"indexed": ["source", "timestamp", "category"]
}
)
# Connect to index
index = pinecone.Index(index_name)
# Check index stats
print(index.describe_index_stats())
Understanding Index Configuration
Dimensions: Must match your embedding model
- OpenAI
text-embedding-ada-002: 1536 - OpenAI
text-embedding-3-small: 1536 - OpenAI
text-embedding-3-large: 3072 - Cohere
embed-english-v3.0: 1024
Metrics:
- Cosine: Best for most use cases, measures angle between vectors
- Euclidean: Measures straight-line distance
- Dotproduct: Faster but requires normalized vectors
Pod Types:
- p1.x1: Standard (100k vectors, ~$70/month)
- p1.x2: 2x capacity (200k vectors)
- s1.x1: Storage-optimized (5M vectors, slower queries)
- p2: Performance-optimized (fastest queries)
Document Processing Pipeline
Chunking Strategy
Proper chunking is critical for RAG performance:
// document-chunker.js
class DocumentChunker {
constructor(options = {}) {
this.chunkSize = options.chunkSize || 1000; // Characters
this.chunkOverlap = options.chunkOverlap || 200; // Overlap for context
this.separators = options.separators || ['\n\n', '\n', '. ', ' '];
}
chunk(text, metadata = {}) {
const chunks = [];
let startIndex = 0;
while (startIndex < text.length) {
// Determine chunk end
let endIndex = Math.min(startIndex + this.chunkSize, text.length);
// Try to break at natural boundaries (paragraph, sentence, etc.)
if (endIndex < text.length) {
endIndex = this.findBestSplit(text, startIndex, endIndex);
}
const chunk = text.slice(startIndex, endIndex).trim();
if (chunk.length > 0) {
chunks.push({
text: chunk,
metadata: {
...metadata,
chunk_index: chunks.length,
start_char: startIndex,
end_char: endIndex,
},
});
}
// Move to next chunk with overlap
startIndex = endIndex - this.chunkOverlap;
}
return chunks;
}
findBestSplit(text, start, end) {
// Try each separator in order of preference
for (const separator of this.separators) {
const lastIndex = text.lastIndexOf(separator, end);
if (lastIndex > start) {
return lastIndex + separator.length;
}
}
return end;
}
// Special chunking for code files
chunkCode(code, language, metadata = {}) {
const chunks = [];
// Split by functions/classes
const functionRegex = {
javascript: /(?:function|class|const|let|var)\s+\w+/g,
python: /(?:def|class)\s+\w+/g,
java: /(?:public|private|protected)?\s*(?:static)?\s*(?:class|interface|void|[\w<>]+)\s+\w+/g,
};
const regex = functionRegex[language] || functionRegex.javascript;
const matches = [...code.matchAll(regex)];
if (matches.length === 0) {
// No functions found, use regular chunking
return this.chunk(code, { ...metadata, type: 'code' });
}
// Create chunks based on function boundaries
for (let i = 0; i < matches.length; i++) {
const start = matches[i].index;
const end = i < matches.length - 1 ? matches[i + 1].index : code.length;
const chunk = code.slice(start, end).trim();
chunks.push({
text: chunk,
metadata: {
...metadata,
type: 'code',
language,
function_name: matches[i][0],
chunk_index: i,
},
});
}
return chunks;
}
// Markdown-aware chunking
chunkMarkdown(markdown, metadata = {}) {
const chunks = [];
const lines = markdown.split('\n');
let currentChunk = '';
let currentHeading = '';
let currentLevel = 0;
for (const line of lines) {
// Check for headers
const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
// Save previous chunk
if (currentChunk.trim().length > 0) {
chunks.push({
text: currentChunk.trim(),
metadata: {
...metadata,
heading: currentHeading,
heading_level: currentLevel,
chunk_index: chunks.length,
},
});
}
// Start new chunk
currentHeading = headerMatch[2];
currentLevel = headerMatch[1].length;
currentChunk = line + '\n';
} else {
currentChunk += line + '\n';
// Check if chunk is getting too large
if (currentChunk.length > this.chunkSize) {
chunks.push({
text: currentChunk.trim(),
metadata: {
...metadata,
heading: currentHeading,
heading_level: currentLevel,
chunk_index: chunks.length,
},
});
currentChunk = '';
}
}
}
// Add final chunk
if (currentChunk.trim().length > 0) {
chunks.push({
text: currentChunk.trim(),
metadata: {
...metadata,
heading: currentHeading,
heading_level: currentLevel,
chunk_index: chunks.length,
},
});
}
return chunks;
}
}
// Usage
const chunker = new DocumentChunker({
chunkSize: 1000,
chunkOverlap: 200,
});
const document = 'Your long document text here...';
const chunks = chunker.chunk(document, {
source: 'company-docs',
document_id: 'doc-123',
timestamp: new Date().toISOString(),
});
Generating Embeddings
// embedding-service.js
const OpenAI = require('openai');
class EmbeddingService {
constructor(apiKey) {
this.openai = new OpenAI({ apiKey });
this.model = 'text-embedding-3-small';
this.batchSize = 100; // OpenAI allows up to 2048 inputs per request
}
async createEmbedding(text) {
const response = await this.openai.embeddings.create({
model: this.model,
input: text,
encoding_format: 'float',
});
return response.data[0].embedding;
}
async createEmbeddingsBatch(texts) {
// Split into batches
const batches = [];
for (let i = 0; i < texts.length; i += this.batchSize) {
batches.push(texts.slice(i, i + this.batchSize));
}
// Process batches
const allEmbeddings = [];
for (const batch of batches) {
const response = await this.openai.embeddings.create({
model: this.model,
input: batch,
encoding_format: 'float',
});
allEmbeddings.push(...response.data.map(d => d.embedding));
// Rate limiting
await this.sleep(100);
}
return allEmbeddings;
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
// Calculate cost for embedding generation
calculateCost(numTokens) {
// text-embedding-3-small: $0.02 / 1M tokens
const pricePerMillion = 0.02;
return (numTokens / 1000000) * pricePerMillion;
}
// Estimate tokens
estimateTokens(text) {
// Rough estimate: ~4 characters per token
return Math.ceil(text.length / 4);
}
}
module.exports = EmbeddingService;
Upserting to Pinecone
// pinecone-service.js
const { Pinecone } = require('@pinecone-database/pinecone');
class PineconeService {
constructor(apiKey, environment, indexName) {
this.client = new Pinecone({
apiKey,
environment,
});
this.index = this.client.index(indexName);
}
async upsertChunks(chunks, embeddings) {
const vectors = chunks.map((chunk, i) => ({
id: chunk.metadata.chunk_id || `chunk-${Date.now()}-${i}`,
values: embeddings[i],
metadata: {
text: chunk.text,
source: chunk.metadata.source,
document_id: chunk.metadata.document_id,
chunk_index: chunk.metadata.chunk_index,
timestamp: chunk.metadata.timestamp,
// Add any custom metadata
...chunk.metadata,
},
}));
// Pinecone allows max 100 vectors per upsert
const batchSize = 100;
for (let i = 0; i < vectors.length; i += batchSize) {
const batch = vectors.slice(i, i + batchSize);
await this.index.upsert(batch);
}
return vectors.length;
}
async query(embedding, options = {}) {
const { topK = 5, filter = {}, includeMetadata = true, includeValues = false } = options;
const results = await this.index.query({
vector: embedding,
topK,
filter,
includeMetadata,
includeValues,
});
return results.matches;
}
async deleteByMetadata(filter) {
await this.index.deleteMany({ filter });
}
async updateMetadata(id, metadata) {
await this.index.update({
id,
setMetadata: metadata,
});
}
async getStats() {
return await this.index.describeIndexStats();
}
// Hybrid search: combine vector similarity with metadata filtering
async hybridSearch(embedding, filters = {}, options = {}) {
const { topK = 10, minScore = 0.7, rerank = true } = options;
// First pass: vector search with filters
const results = await this.query(embedding, {
topK: topK * 2, // Get more results for reranking
filter: filters,
includeMetadata: true,
});
// Filter by minimum score
let filtered = results.filter(r => r.score >= minScore);
// Rerank based on additional criteria
if (rerank) {
filtered = this.rerankResults(filtered);
}
return filtered.slice(0, topK);
}
rerankResults(results) {
return results.sort((a, b) => {
// Boost recent documents
const aRecency = new Date(a.metadata.timestamp).getTime();
const bRecency = new Date(b.metadata.timestamp).getTime();
const recencyScore = (bRecency - aRecency) / (1000 * 60 * 60 * 24 * 365); // Years
// Combine similarity score with recency
const aFinalScore = a.score + recencyScore * 0.1;
const bFinalScore = b.score + recencyScore * 0.1;
return bFinalScore - aFinalScore;
});
}
}
module.exports = PineconeService;
LangChain Integration
Building the RAG Chain
// rag-chain.js
const { ChatOpenAI } = require('@langchain/openai');
const { PromptTemplate } = require('@langchain/core/prompts');
const { RunnableSequence } = require('@langchain/core/runnables');
const { StringOutputParser } = require('@langchain/core/output_parsers');
class RAGChain {
constructor(pineconeService, embeddingService) {
this.pinecone = pineconeService;
this.embeddings = embeddingService;
// Initialize LLM
this.llm = new ChatOpenAI({
modelName: 'gpt-4-turbo-preview',
temperature: 0.1, // Low temperature for factual responses
maxTokens: 1000,
});
// Create prompt template
this.promptTemplate = PromptTemplate.fromTemplate(`
You are a helpful assistant that answers questions based on the provided context.
Context:
{context}
Question: {question}
Instructions:
- Answer based ONLY on the provided context
- If the context doesn't contain the answer, say "I don't have enough information to answer that"
- Be concise and accurate
- Cite specific parts of the context when possible
Answer:`);
}
async query(question, options = {}) {
const { topK = 5, filters = {}, includeReferences = true } = options;
// 1. Create embedding for the question
const questionEmbedding = await this.embeddings.createEmbedding(question);
// 2. Search Pinecone for relevant chunks
const searchResults = await this.pinecone.hybridSearch(questionEmbedding, filters, {
topK,
minScore: 0.7,
});
if (searchResults.length === 0) {
return {
answer: "I don't have any relevant information to answer that question.",
references: [],
confidence: 0,
};
}
// 3. Format context from search results
const context = this.formatContext(searchResults);
// 4. Generate answer using LLM
const chain = RunnableSequence.from([this.promptTemplate, this.llm, new StringOutputParser()]);
const answer = await chain.invoke({
context,
question,
});
// 5. Calculate confidence score
const avgScore = searchResults.reduce((sum, r) => sum + r.score, 0) / searchResults.length;
return {
answer,
references: includeReferences ? this.formatReferences(searchResults) : [],
confidence: avgScore,
searchResults: searchResults.length,
};
}
formatContext(results) {
return results
.map((result, i) => {
const source = result.metadata.source || 'Unknown';
const text = result.metadata.text;
return `[Source ${i + 1}: ${source}]\n${text}`;
})
.join('\n\n---\n\n');
}
formatReferences(results) {
return results.map((result, i) => ({
index: i + 1,
source: result.metadata.source,
document_id: result.metadata.document_id,
score: result.score,
excerpt: result.metadata.text.substring(0, 200) + '...',
}));
}
// Conversational RAG with chat history
async queryWithHistory(question, chatHistory = [], options = {}) {
// Reformulate question based on chat history
const reformulatedQuestion = await this.reformulateQuestion(question, chatHistory);
// Get answer
const response = await this.query(reformulatedQuestion, options);
return {
...response,
reformulated_question: reformulatedQuestion,
};
}
async reformulateQuestion(question, chatHistory) {
if (chatHistory.length === 0) {
return question;
}
const historyText = chatHistory
.slice(-3) // Last 3 exchanges
.map(h => `Human: ${h.question}\nAssistant: ${h.answer}`)
.join('\n\n');
const reformulationPrompt = `
Given the following conversation history and a new question, reformulate the question to be standalone.
Conversation history:
${historyText}
New question: ${question}
Standalone question:`;
const response = await this.llm.invoke(reformulationPrompt);
return response.content.trim();
}
// Multi-query retrieval
async multiQueryRetrieval(question, options = {}) {
// Generate multiple variations of the question
const variations = await this.generateQueryVariations(question);
// Search for each variation
const allResults = [];
for (const variation of variations) {
const embedding = await this.embeddings.createEmbedding(variation);
const results = await this.pinecone.query(embedding, {
topK: 3,
...options,
});
allResults.push(...results);
}
// Deduplicate and rank
const uniqueResults = this.deduplicateResults(allResults);
const topResults = uniqueResults.slice(0, options.topK || 5);
// Generate answer
const context = this.formatContext(topResults);
const answer = await this.llm.invoke(await this.promptTemplate.format({ context, question }));
return {
answer: answer.content,
references: this.formatReferences(topResults),
query_variations: variations,
};
}
async generateQueryVariations(question) {
const prompt = `
Generate 3 different ways to ask the following question:
Original question: ${question}
Variations:
1.`;
const response = await this.llm.invoke(prompt);
const variations = response.content
.split('\n')
.filter(line => line.match(/^\d+\./))
.map(line => line.replace(/^\d+\.\s*/, '').trim());
return [question, ...variations];
}
deduplicateResults(results) {
const seen = new Set();
const unique = [];
for (const result of results) {
const id = result.metadata.chunk_id || result.id;
if (!seen.has(id)) {
seen.add(id);
unique.push(result);
}
}
// Sort by score
return unique.sort((a, b) => b.score - a.score);
}
}
module.exports = RAGChain;
n8n Workflow for Document Ingestion
Automated Document Processing
{
"name": "Document Ingestion Pipeline",
"nodes": [
{
"name": "Webhook - New Document",
"type": "n8n-nodes-base.webhook",
"parameters": {
"path": "ingest-document",
"responseMode": "responseNode",
"options": {}
}
},
{
"name": "Download Document",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "={{ $json.document_url }}",
"responseFormat": "text"
}
},
{
"name": "Detect Document Type",
"type": "n8n-nodes-base.function",
"parameters": {
"functionCode": "const url = $json.document_url;\nconst extension = url.split('.').pop().toLowerCase();\n\nreturn {\n content: $json.body,\n type: extension,\n metadata: {\n source: $json.source || 'upload',\n document_id: $json.document_id,\n timestamp: new Date().toISOString()\n }\n};"
}
},
{
"name": "Chunk Document",
"type": "n8n-nodes-base.code",
"parameters": {
"language": "javascript",
"jsCode": "const DocumentChunker = require('./document-chunker');\n\nconst chunker = new DocumentChunker({\n chunkSize: 1000,\n chunkOverlap: 200\n});\n\nconst content = $input.item.json.content;\nconst type = $input.item.json.type;\nconst metadata = $input.item.json.metadata;\n\nlet chunks;\nif (type === 'md' || type === 'markdown') {\n chunks = chunker.chunkMarkdown(content, metadata);\n} else if (['js', 'py', 'java'].includes(type)) {\n chunks = chunker.chunkCode(content, type, metadata);\n} else {\n chunks = chunker.chunk(content, metadata);\n}\n\nreturn chunks;"
}
},
{
"name": "Generate Embeddings",
"type": "n8n-nodes-base.openAi",
"parameters": {
"operation": "embeddings",
"model": "text-embedding-3-small",
"text": "={{ $json.text }}"
}
},
{
"name": "Upsert to Pinecone",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"method": "POST",
"url": "https://{{ $env.PINECONE_INDEX }}-{{ $env.PINECONE_ENV }}.svc.pinecone.io/vectors/upsert",
"authentication": "genericCredentialType",
"headers": {
"Api-Key": "={{ $env.PINECONE_API_KEY }}",
"Content-Type": "application/json"
},
"body": {
"vectors": [
{
"id": "={{ $json.metadata.document_id }}-chunk-{{ $json.metadata.chunk_index }}",
"values": "={{ $json.embedding }}",
"metadata": "={{ $json.metadata }}"
}
]
}
}
},
{
"name": "Send Success Response",
"type": "n8n-nodes-base.respondToWebhook",
"parameters": {
"respondWith": "json",
"responseBody": "={{ { success: true, chunks_processed: $items().length } }}"
}
}
]
}
Advanced RAG Techniques
Contextual Compression
Reduce context size while preserving relevance:
class ContextualCompressor {
constructor(llm) {
this.llm = llm;
}
async compressContext(question, documents) {
const compressionPrompt = `
Given a question and a document, extract only the parts that are relevant to answering the question.
Question: ${question}
Document:
{document}
Relevant excerpts (preserve exact quotes):`;
const compressed = [];
for (const doc of documents) {
const prompt = compressionPrompt.replace('{document}', doc.metadata.text);
const response = await this.llm.invoke(prompt);
if (response.content.trim().length > 0) {
compressed.push({
...doc,
compressed_text: response.content.trim(),
});
}
}
return compressed;
}
}
Hypothetical Document Embeddings (HyDE)
Improve retrieval by generating hypothetical answers:
class HyDERetriever {
constructor(llm, pineconeService, embeddingService) {
this.llm = llm;
this.pinecone = pineconeService;
this.embeddings = embeddingService;
}
async retrieve(question, topK = 5) {
// Generate hypothetical answer
const hydePrompt = `
Write a detailed answer to the following question. Make it specific and factual.
Question: ${question}
Answer:`;
const hypotheticalAnswer = await this.llm.invoke(hydePrompt);
// Embed the hypothetical answer
const embedding = await this.embeddings.createEmbedding(hypotheticalAnswer.content);
// Search using hypothetical answer embedding
const results = await this.pinecone.query(embedding, { topK });
return results;
}
}
Parent Document Retrieval
Retrieve small chunks but provide larger context:
class ParentDocumentRetriever {
constructor(pineconeService) {
this.pinecone = pineconeService;
}
async retrieve(embedding, options = {}) {
const { topK = 5 } = options;
// Search for small chunks
const childResults = await this.pinecone.query(embedding, { topK: topK * 2 });
// Get parent documents
const parentDocs = new Map();
for (const result of childResults) {
const parentId = result.metadata.parent_id;
if (!parentDocs.has(parentId)) {
// Fetch full parent document
const parent = await this.pinecone.fetch([parentId]);
parentDocs.set(parentId, {
...parent,
max_score: result.score,
});
} else {
// Update score if this chunk is more relevant
const existing = parentDocs.get(parentId);
if (result.score > existing.max_score) {
existing.max_score = result.score;
}
}
}
// Sort by best child score
return Array.from(parentDocs.values())
.sort((a, b) => b.max_score - a.max_score)
.slice(0, topK);
}
}
Evaluation and Monitoring
RAG Metrics
class RAGEvaluator {
constructor(llm) {
this.llm = llm;
}
async evaluateAnswer(question, answer, groundTruth, retrievedDocs) {
const metrics = {};
// 1. Answer Relevancy
metrics.relevancy = await this.evaluateRelevancy(question, answer);
// 2. Answer Correctness (if ground truth available)
if (groundTruth) {
metrics.correctness = await this.evaluateCorrectness(answer, groundTruth);
}
// 3. Faithfulness (answer grounded in retrieved docs)
metrics.faithfulness = await this.evaluateFaithfulness(answer, retrievedDocs);
// 4. Context Relevancy
metrics.contextRelevancy = await this.evaluateContextRelevancy(question, retrievedDocs);
return metrics;
}
async evaluateRelevancy(question, answer) {
const prompt = `
Rate how well the answer addresses the question on a scale of 0-1.
Question: ${question}
Answer: ${answer}
Rating (0-1):`;
const response = await this.llm.invoke(prompt);
return parseFloat(response.content.trim());
}
async evaluateFaithfulness(answer, documents) {
const context = documents.map(d => d.metadata.text).join('\n\n');
const prompt = `
Rate how well the answer is supported by the context on a scale of 0-1.
1.0 means fully supported, 0.0 means not supported at all.
Context:
${context}
Answer: ${answer}
Rating (0-1):`;
const response = await this.llm.invoke(prompt);
return parseFloat(response.content.trim());
}
async evaluateCorrectness(answer, groundTruth) {
const prompt = `
Rate how correct the answer is compared to the ground truth on a scale of 0-1.
Ground Truth: ${groundTruth}
Answer: ${answer}
Rating (0-1):`;
const response = await this.llm.invoke(prompt);
return parseFloat(response.content.trim());
}
async evaluateContextRelevancy(question, documents) {
const relevantCount = documents.filter(d => d.score > 0.7).length;
return relevantCount / documents.length;
}
}
Monitoring RAG Performance
// Track RAG metrics
const trackRAGMetrics = async (query, response, duration) => {
await metrics.recordQuery({
query,
num_results: response.searchResults,
avg_confidence: response.confidence,
duration_ms: duration,
timestamp: new Date(),
});
// Alert on low confidence
if (response.confidence < 0.5) {
await sendAlert({
type: 'low_confidence_answer',
query,
confidence: response.confidence,
});
}
};
Production Best Practices
1. Caching
class RAGCache {
constructor(redis) {
this.redis = redis;
this.ttl = 3600; // 1 hour
}
async get(question) {
const cached = await this.redis.get(`rag:${this.hashQuestion(question)}`);
return cached ? JSON.parse(cached) : null;
}
async set(question, response) {
await this.redis.setex(
`rag:${this.hashQuestion(question)}`,
this.ttl,
JSON.stringify(response)
);
}
hashQuestion(question) {
const crypto = require('crypto');
return crypto.createHash('md5').update(question.toLowerCase()).digest('hex');
}
}
2. Rate Limiting
const rateLimit = require('express-rate-limit');
const ragLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 20, // 20 requests per minute
message: 'Too many queries, please try again later',
});
app.post('/api/rag/query', ragLimiter, async (req, res) => {
// Handle RAG query
});
3. Cost Optimization
// Track costs
const trackCosts = {
embedding: numTokens => (numTokens * 0.00002) / 1000, // $0.02/1M tokens
llm: (inputTokens, outputTokens) => {
return (inputTokens * 0.01) / 1000 + (outputTokens * 0.03) / 1000;
},
pinecone: numQueries => numQueries * 0.000004, // Approximate
};
// Optimize by caching and batch processing
Complete Example
// main.js - Complete RAG implementation
const EmbeddingService = require('./embedding-service');
const PineconeService = require('./pinecone-service');
const RAGChain = require('./rag-chain');
// Initialize services
const embeddings = new EmbeddingService(process.env.OPENAI_API_KEY);
const pinecone = new PineconeService(
process.env.PINECONE_API_KEY,
process.env.PINECONE_ENV,
'knowledge-base'
);
const rag = new RAGChain(pinecone, embeddings);
// Example: Ingest document
async function ingestDocument(text, metadata) {
const chunker = new DocumentChunker();
const chunks = chunker.chunk(text, metadata);
const texts = chunks.map(c => c.text);
const embeddingVectors = await embeddings.createEmbeddingsBatch(texts);
await pinecone.upsertChunks(chunks, embeddingVectors);
console.log(`Ingested ${chunks.length} chunks`);
}
// Example: Query
async function query(question) {
const response = await rag.query(question, {
topK: 5,
includeReferences: true,
});
console.log('Answer:', response.answer);
console.log('Confidence:', response.confidence);
console.log('References:', response.references);
return response;
}
// Run examples
(async () => {
// Ingest
await ingestDocument('Your document text here...', { source: 'docs', category: 'technical' });
// Query
await query('What is the main topic discussed?');
})();
Join the Community
Building production RAG systems requires expertise in AI, databases, and system architecture. The House of Loops community brings together AI engineers and developers building advanced RAG applications.
Join us to:
- Share RAG implementations and best practices
- Get feedback on your RAG architecture
- Access production-ready RAG templates
- Participate in AI/LLM workshops
- Connect with developers building similar systems
Join House of Loops Today and get $100K+ in startup credits including OpenAI and Pinecone credits to build your RAG system.
Building a RAG system? Our community has AI engineers ready to help optimize your implementation!
House of Loops Team
House of Loops is a technology-focused community for learning and implementing advanced automation workflows using n8n, Strapi, AI/LLM, and DevSecOps tools.
Join Our Community![Top 10 AI Tools for Automation in 2025 [2025]](/blog/images/top-10-ai-tools.jpg)
![Building AI Agents with n8n and OpenAI: Complete Guide [2025]](/blog/images/building-ai-agents.jpg)
![Zapier vs n8n vs Make: Which Automation Tool is Right for You? [2025]](/blog/images/zapier-vs-n8n-vs-make.jpg)