{
  "tool": "refund",
  "scope": "billing:write",
  "retry": true
} $ mcp serve --pin
✓ deps locked
✓ api-version pinned

Fixed-scope · 1–2 weeks · auth-scoped, fail-soft, tested

Production MCP servers for AI agents that safely act on your systems.

Your agent nails the demo. Then it has to act on a real internal tool — and the hand-rolled MCP server starts falling over. I turn that flaky integration into a production-grade server your team can trust, in a fixed-scope sprint.

The SDK wrapper is the easy part. The reason yours won't page you at 2am — fail-soft behavior, scoped auth, pinned versions — is the part I build.

Scope your sprint See the live proof

✓ auth-scoped ✓ fail-soft · 503 → retry

billing.manifest.yaml

# one tool → one server
server: internal-billing
runtime: python
tool: refund
auth: scoped # billing:write
on_fail: soft

mcp-factory → scaffold

internal-billing · MCP server live

Auth-scoped — trust boundary proven, not asserted
Fail-soft — backend down → clean error, agent lives
Version-pinned — deps + upstream API header
Tested — green on a clean checkout

179

passing tests · public engine

1–2 wks

fixed timeline, not open-ended

Py · Node

both runtimes covered

50 / 50

paid on signature & acceptance

try:
refund(order)
except:
crash 💥

Auth that isn't scopedOne over-broad credential and the agent can do anything that token can — across every tool surface. No isolation, no trust boundary.
No fail-soft behaviorA backend dies, the server throws, and the agent session crashes with it. One dead dependency takes the whole agent offline instead of returning a clean error.
Nothing pinnedUnpinned deps and an unpinned upstream API version mean the backend can silently change response shape — and the integration breaks with no warning.
No tests, no docs, no handoffIt runs on one laptop. No test suite on a clean checkout, no setup docs a teammate can follow — so it's not something your senior people should be babysitting.

The problem

Hand-rolled MCP servers break at 2am.

“Just call the API” stopped being enough — your agent has to act on internal tools. Someone wired up an MCP server under deadline; it sails through the happy path, then a backend hiccups and takes the whole agent down with it. The hard parts are exactly the ones that got skipped.

What you get

The production layer — itemized.

One of your internal tools, shipped as a production-grade MCP server your team can run and trust. The scaffold is the fast start; the hardened layer on top is the deliverable — exactly what the commodity gigs leave out.

Fail-soft handling

A design, not a try/except. Backend-down, rate-limit, 401/403/404, malformed input, timeout — each returns a clean structured error that tells the agent when to retry, never crashing the session.

backend 503 429 rate-limit 401 / 403 timeout → structured error · safe to retry

circuit-breaker pattern

Auth-scoping

Per-tool scoping and a trust boundary that's proven, not asserted — tested against a resource the token can see and one it can't, so you know the boundary actually holds.

trust-tier isolation

Version-pinning

Two axes, not one: the dependency lockfile and the upstream API-version header — so a fresh install reproduces and the backend can't silently change response shape under you.

deps + API version

Test suite

Hermetic tests that pass on a clean checkout — covering the tool surface and the real failure set — so reliability is something you can re-verify, not take on faith.

green on clean checkout

Setup docs

A README a teammate can follow to run it unaided — env, install, register, run. No “works on my machine,” no single point of failure on the one laptop it was built on.

teammate-runnable

Handoff walkthrough

One live walkthrough, registered and running in your environment, accepted against an explicit Definition of Done — plus a 14-day support window on delivered scope.

accepted vs. DoD

Why this is the whole point

Anyone can wrap the SDK and pass the happy path. The hardened layer above — the boundary that holds, the failure set that's handled, the install that reproduces — is what turns a demo into something your senior engineers don't have to babysit.

auth-scoped fail-soft version-pinned tested documented handed off

the deliverable, not the demo

✓ 179 passing
✓ 74 tests
✓ 121 tests

Proof you can check

Don't take my word for it — read the code.

The capability is public and checkable: an open-source engine with a real test suite, and a production read-only server built end-to-end against the exact delivery process below.

✓ Every claim on this page maps to a real artifact. First client case studies coming soon — no fabricated names or numbers here. The proof is the code you can read today.

90-second walkthrough

Watch the engine go from a YAML manifest to a live MCP serve call — the fastest way to see exactly what a sprint delivers, no checkout required.

▶ ~90 sec manifest → live serve no setup to watch

Watch the demo

mcp-factory

A manifest-driven MCP server scaffolder and runtime hub. Write a YAML manifest, get a Claude MCP server scaffold — with batch registration and a live routing hub. The hardened production layer is built on top, per engagement.

✓ 179 passing tests Python + Node open source

View on GitHub

github-mcp

Read+write over a real external API — the exact common client ask. A public GitHub REST API MCP server built against the same delivery kit: write tools off by default, typed rate-limit handling, fail-soft + version-pinned.

✓ 74 passing tests read + write public repo

Read the case study

How I work, shown in real builds

Case studies — no fabricated client wins.

These are my own public, tested builds — and an honest look at how I validate (and kill) systems. Every number links to the repo or write-up that backs it.

// mcp-factory

A manifest-driven MCP scaffolder + runtime. 179 passing tests, public repo, 90-second Loom demo of a live test run.

✓ 179 testsscaffold

Read the case study

// github-mcp

Read+write over a real external API — the exact common client ask. Write tools off by default, typed rate-limit handling, 74 passing tests, public repo.

✓ 74 testsread + write

Read the case study

// desktop-mcp

OS-level control when there's no API: screenshot, windows, input, screen-recording. Input off by default, rate-capped, 121 passing tests, public repo.

✓ 121 testsOS-level

Read the case study

// rag-mcp

Retrieval-augmented search exposed as an MCP tool. Local ONNX embeddings, fail-soft, 50 tests, $0 inference cost.

✓ 50 tests$0 inference

Read the case study

// honest validation harness

The reliability story: how I forward-test systems honestly and retire the ones that don't earn their keep. Rigor, not returns.

validationno fabricated returns

Read the case study

scope →
build →
harden →
handoff

Scope & kickoff Day 0

Signed SOW + access. A scoping questionnaire picks the one tool, the actions that matter, the auth model, and what failure should look like. Kickoff call confirms it; we write the manifest.

Scaffold & build Days 1–3

Scaffold via mcp-factory, then implement the handlers for your tool. (If the API is undocumented, an explicit discovery line-item maps endpoints, auth, and real response shapes first — that's the real risk axis.)

Harden — the production layer Days 4–6

Where the value lives: auth-scoping, fail-soft across the real failure set, version-pinning on both axes, and error messages that tell the agent when to retry.

Test & validate live Days 7–8

Test suite green on a clean checkout, then live hub validation in your environment — including the auth-scope boundary proven against a private resource pair.

Handoff & accept Days 9–10

Setup docs, a live walkthrough, and acceptance against the Definition of Done — then a 14-day support window on delivered scope.

The sprint

Day 0 to handoff, without improvising.

A repeatable flow: scope → build → harden → handoff. Fixed scope, an explicit Definition of Done, and a 2-day buffer held for the inevitable auth or edge-case surprise.

The clock starts at access. The 1–2 week window is calendar time — dominated by access latency and your team's review availability, not 10 days of coding. A documented API is a focused 1–2 day build; protecting the auth/access long-pole is what keeps the sprint clean.

Pricing

Priced by scope, not by the hour.

The number comes after a short scoping questionnaire, never before. The ladder below is the shape; we land on the right rung once we've seen your one tool, its auth model, and whether the API is documented.

Readiness Audit

flat$1,500

A 2–3 day read of your stack, delivered as a written MCP spec + risk report. No code — the report is the product.

Recommended tool surface + auth strategy
3 risk flags + your estimated sprint tier
Or a working read-only spike, same price
Fully credited toward a sprint

Start with an audit

The sprint

Single-source

scope-dependent$5k–$8k

One internal tool, shipped as a production-grade MCP server — the full hardened build.

Auth-scoping, fail-soft, version-pinning
Test suite, setup docs, handoff
1–2 weeks · 50% / 50% terms
14-day support window

Scope this build

Multi-source / platform

scope-dependent$12k–$25k

More than one system, or a broader platform surface — a separate, larger statement of work.

Multiple tools / data sources
Scoped as its own engagement
Reliability retainer available after

Talk through scope

⬡ Scoping-gated · the questionnaire sets the tier before any SOW · 50% on signature, 50% on acceptance

Before you reach out

FAQ

Do you work hourly?

No. Work is fixed-scope sprints, scoped and quoted flat after a short call. The entry point is a $1,500 MCP Readiness Audit (written report, no code), credited toward a Sprint if you proceed within 60 days.

How long does a Sprint take?

A typical Integration Sprint ships a working, tested server in about two weeks. Multi-source and self-hosted builds are larger and scoped separately.

What if I already have an MCP server?

The audit covers that too — a read of the existing server for auth-scoping gaps, fail-soft handling, version-pinning, and test coverage, delivered as a written risk log. The MCP spec is still moving; existing servers will need updates as it changes.

What stack do you support?

Python and Node MCP servers over REST, on cloud or self-hosted / on-prem. Auth via API-key, OAuth, or session-scoping. If your tools speak HTTP, they can be wired.

Can I see proof before committing?

That's the whole point. mcp-factory is public with 179 passing tests, and there's a 90-second Loom demo showing the test run, auth-scoping, and fail-soft handling live.

Do you have client testimonials?

Not yet — and I won't fabricate any. Instead I publish tested code you can check yourself and a case study on how I validate systems honestly (including killing ones that don't work). That's the trust signal, in place of borrowed logos.

Get in touch

Got an agent that needs to touch an internal tool?

Tell me the one system you'd wire in first. I'll send a short scoping questionnaire, set the tier, and you'll know exactly what you're getting before anything is signed.

No call required to start — a working repo and a clear scope beat a sales pitch.

Email me your one tool