Skip to content

RAG (Retrieval-Augmented Generation) and MFO Server

The MFO Server implements RAG (Retrieval-Augmented Generation) through a dedicated rag provider module. Here’s how it works:


🔍 How It Manages RAG

1️⃣ Document & Embeddings Storage:

  • The system has two key PostgreSQL tables:

  • rag.documents: stores metadata and full content of documents (e.g., user_id, collection, tags, content).

  • rag.embeddings: stores individual text chunks along with their embedding vectors (linked via document_id).

2️⃣ Chunking & Embedding:

  • When you add a document:

  • It is chunked (split into parts).

  • Each chunk is processed by an LLM provider to generate embeddings.
  • Both document metadata and embeddings are saved (bulk insert using pgvector).

3️⃣ Search (Vector Similarity):

  • The rag.search tool:

  • Converts the query into an embedding.

  • Runs a vector similarity search using PostgreSQL’s pgvector extension.
  • Returns the best-matching chunks based on cosine similarity.

4️⃣ Multi-Tenant Design:

  • All RAG operations are scoped by user_id and collection to keep data isolated and organized.

🛠️ Key Tools

Tool Description
rag.add_document Adds a single document (with chunking & embedding).
rag.add_documents Adds multiple documents in batch.
rag.search Searches for relevant chunks using vector similarity.

Example: The search tool parses parameters like query, user_id, collection, and optional tags, and returns results ranked by similarity.


⚙️ LLM & Config

  • LLM Provider: The system uses a pluggable LLM manager to generate embeddings (e.g., OpenAI’s text-embedding-ada-002 with 1536 dimensions).
  • Configuration: The embedding dimension is loaded from environment variables (e.g., RAG_EMBEDDING_DIMENSION).

🔄 Query Flow Example

1️⃣ Input: A user submits a query with user_id = 42, collection = "support_docs", and query = "How to reset password?".

2️⃣ Processing:

  • The query is embedded.
  • A similarity search finds top 5 chunks in "support_docs".

3️⃣ Output:

  • Returns chunks with similarity scores, ready to be fed into an LLM for final generation.

🗂️ Data Schema (Simplified)

erDiagram
    DOCUMENTS {
        UUID id PK
        STRING user_id
        STRING collection
        TEXT content
        TEXT[] tags
    }
    EMBEDDINGS {
        UUID id PK
        UUID document_id FK
        TEXT chunk
        VECTOR embedding
    }
    DOCUMENTS ||--o{ EMBEDDINGS : contains

✅ Special Features

  • Bulk Inserts: Uses pgx.CopyFrom for efficient embedding storage.
  • Full-Text + Vector Search: Although focused on vector similarity, a content_tsvector column enables optional full-text search.
  • Extensible Tools: Easy to add tools via adapter.go to expose new RAG functionalities.

In summary, the MFO Server’s RAG system provides a full pipeline from document ingestion to retrieval and augmentation, with tight integration between PostgreSQL (via pgvector), LLMs, and Go-based tooling.