rag-mcp — a real, citation-backed MCP server

// second MCP proof · 50 tests · local embeddings · $0 inference

Python MCP SDK 50 tests local ONNX ChromaDB corpus-scoped

The problem

"Let the agent search our docs" sounds simple until you ask the hard questions: where do the embeddings run (and what do they cost), can the tool be tricked into reading files outside the approved corpus, what happens when the vector store is empty or down, and can you trust an answer you can't trace back to a source? A retrieval tool that gets any of those wrong is worse than no tool at all.

What I built

rag-mcp exposes one tool — search_knowledge(query, k) — over the Python MCP SDK on stdio transport. It embeds the query with a local model, vector-searches a local corpus, and returns cited passages. Deliberately small surface area, production-grade edges:

Local + $0 — embeddings run on CPU via ONNX BAAI/bge-large-en-v1.5 (1024-dim); no per-query API cost, no data leaving the box.
Zero-infra vector store — ChromaDB embedded PersistentClient; nothing to stand up.
Auth-scoped to a corpus root — sources that try to escape it (absolute paths, .. traversal) are refused. An agent can't read across the wall.
Fail-soft — a down or empty store returns a structured error, never an exception that crashes the calling agent.
Citations on every hit — each result carries source + heading + chunk_index, so an answer is always traceable.

Evidence you can check

Tests: 50 passed across test_chunking.py, test_cli.py, test_ingest.py, test_lock.py, test_search.py, test_server.py, test_store.py
Coverage spans the full path: chunking → ingestion → search → the MCP server surface.
Runs as a registered MCP server; the search_knowledge tool is callable from an agent session.

The same four edges that matter in a client build — local/cost-controlled, auth-scoped, fail-soft, and traceable — are exactly what this 50-test suite exercises. Small tool, full rails.

What it shows about how I work

A good MCP tool isn't measured by how many features it has — it's measured by whether the trust boundary holds, the failure modes are clean, and the output is verifiable. rag-mcp is a compact demonstration of all three, the way I'd build a retrieval tool against your corpus.

Book a scoping call Next case study →