rag-mcp — a real, citation-backed MCP server
Python MCP SDK 50 tests local ONNX ChromaDB corpus-scopedThe problem
"Let the agent search our docs" sounds simple until you ask the hard questions: where do the embeddings run (and what do they cost), can the tool be tricked into reading files outside the approved corpus, what happens when the vector store is empty or down, and can you trust an answer you can't trace back to a source? A retrieval tool that gets any of those wrong is worse than no tool at all.
What I built
rag-mcp exposes one tool — search_knowledge(query, k) — over the
Python MCP SDK on stdio transport. It embeds the query with a local model, vector-searches a
local corpus, and returns cited passages. Deliberately small surface area, production-grade edges:
- Local + $0 — embeddings run on CPU via ONNX
BAAI/bge-large-en-v1.5(1024-dim); no per-query API cost, no data leaving the box. - Zero-infra vector store — ChromaDB embedded
PersistentClient; nothing to stand up. - Auth-scoped to a corpus root — sources that try to escape it (absolute paths,
..traversal) are refused. An agent can't read across the wall. - Fail-soft — a down or empty store returns a structured error, never an exception that crashes the calling agent.
- Citations on every hit — each result carries
source+heading+chunk_index, so an answer is always traceable.
Evidence you can check
- Tests:
50 passedacrosstest_chunking.py,test_cli.py,test_ingest.py,test_lock.py,test_search.py,test_server.py,test_store.py - Coverage spans the full path: chunking → ingestion → search → the MCP server surface.
- Runs as a registered MCP server; the
search_knowledgetool is callable from an agent session.
The same four edges that matter in a client build — local/cost-controlled, auth-scoped, fail-soft, and traceable — are exactly what this 50-test suite exercises. Small tool, full rails.
What it shows about how I work
A good MCP tool isn't measured by how many features it has — it's measured by whether the trust boundary holds, the failure modes are clean, and the output is verifiable. rag-mcp is a compact demonstration of all three, the way I'd build a retrieval tool against your corpus.