{
  "tool": "refund",
  "scope": "billing:write",
  "retry": true
}
$ mcp serve --pin
deps locked
api-version pinned
Fixed-scope · 1–2 weeks · auth-scoped, fail-soft, tested

Production MCP servers for AI agents that safely act on your systems.

Your agent nails the demo. Then it has to act on a real internal tool — and the hand-rolled MCP server starts falling over. I turn that flaky integration into a production-grade server your team can trust, in a fixed-scope sprint.

The SDK wrapper is the easy part. The reason yours won't page you at 2am — fail-soft behavior, scoped auth, pinned versions — is the part I build.

179
passing tests · public engine
1–2 wks
fixed timeline, not open-ended
Py · Node
both runtimes covered
50 / 50
paid on signature & acceptance
try:
  refund(order)
except:
  crash 💥
  • Auth that isn't scopedOne over-broad credential and the agent can do anything that token can — across every tool surface. No isolation, no trust boundary.
  • No fail-soft behaviorA backend dies, the server throws, and the agent session crashes with it. One dead dependency takes the whole agent offline instead of returning a clean error.
  • Nothing pinnedUnpinned deps and an unpinned upstream API version mean the backend can silently change response shape — and the integration breaks with no warning.
  • No tests, no docs, no handoffIt runs on one laptop. No test suite on a clean checkout, no setup docs a teammate can follow — so it's not something your senior people should be babysitting.
The problem

Hand-rolled MCP servers break at 2am.

“Just call the API” stopped being enough — your agent has to act on internal tools. Someone wired up an MCP server under deadline; it sails through the happy path, then a backend hiccups and takes the whole agent down with it. The hard parts are exactly the ones that got skipped.

What you get

The production layer — itemized.

One of your internal tools, shipped as a production-grade MCP server your team can run and trust. The scaffold is the fast start; the hardened layer on top is the deliverable — exactly what the commodity gigs leave out.

Fail-soft handling

A design, not a try/except. Backend-down, rate-limit, 401/403/404, malformed input, timeout — each returns a clean structured error that tells the agent when to retry, never crashing the session.

backend 503 429 rate-limit 401 / 403 timeout → structured error · safe to retry
circuit-breaker pattern

Auth-scoping

Per-tool scoping and a trust boundary that's proven, not asserted — tested against a resource the token can see and one it can't, so you know the boundary actually holds.

trust-tier isolation

Version-pinning

Two axes, not one: the dependency lockfile and the upstream API-version header — so a fresh install reproduces and the backend can't silently change response shape under you.

deps + API version

Test suite

Hermetic tests that pass on a clean checkout — covering the tool surface and the real failure set — so reliability is something you can re-verify, not take on faith.

green on clean checkout

Setup docs

A README a teammate can follow to run it unaided — env, install, register, run. No “works on my machine,” no single point of failure on the one laptop it was built on.

teammate-runnable

Handoff walkthrough

One live walkthrough, registered and running in your environment, accepted against an explicit Definition of Done — plus a 14-day support window on delivered scope.

accepted vs. DoD

Why this is the whole point

Anyone can wrap the SDK and pass the happy path. The hardened layer above — the boundary that holds, the failure set that's handled, the install that reproduces — is what turns a demo into something your senior engineers don't have to babysit.

auth-scoped fail-soft version-pinned tested documented handed off
the deliverable, not the demo
179 passing
74 tests
121 tests
How I work, shown in real builds

Case studies — no fabricated client wins.

These are my own public, tested builds — and an honest look at how I validate (and kill) systems. Every number links to the repo or write-up that backs it.

scope
build
harden
handoff
0

Scope & kickoff Day 0

Signed SOW + access. A scoping questionnaire picks the one tool, the actions that matter, the auth model, and what failure should look like. Kickoff call confirms it; we write the manifest.

1

Scaffold & build Days 1–3

Scaffold via mcp-factory, then implement the handlers for your tool. (If the API is undocumented, an explicit discovery line-item maps endpoints, auth, and real response shapes first — that's the real risk axis.)

2

Harden — the production layer Days 4–6

Where the value lives: auth-scoping, fail-soft across the real failure set, version-pinning on both axes, and error messages that tell the agent when to retry.

3

Test & validate live Days 7–8

Test suite green on a clean checkout, then live hub validation in your environment — including the auth-scope boundary proven against a private resource pair.

4

Handoff & accept Days 9–10

Setup docs, a live walkthrough, and acceptance against the Definition of Done — then a 14-day support window on delivered scope.

The sprint

Day 0 to handoff, without improvising.

A repeatable flow: scope → build → harden → handoff. Fixed scope, an explicit Definition of Done, and a 2-day buffer held for the inevitable auth or edge-case surprise.

The clock starts at access. The 1–2 week window is calendar time — dominated by access latency and your team's review availability, not 10 days of coding. A documented API is a focused 1–2 day build; protecting the auth/access long-pole is what keeps the sprint clean.
Pricing

Priced by scope, not by the hour.

The number comes after a short scoping questionnaire, never before. The ladder below is the shape; we land on the right rung once we've seen your one tool, its auth model, and whether the API is documented.

Readiness Audit
flat$1,500
A 2–3 day read of your stack, delivered as a written MCP spec + risk report. No code — the report is the product.
  • Recommended tool surface + auth strategy
  • 3 risk flags + your estimated sprint tier
  • Or a working read-only spike, same price
  • Fully credited toward a sprint
Start with an audit
The sprint
Single-source
scope-dependent$5k–$8k
One internal tool, shipped as a production-grade MCP server — the full hardened build.
  • Auth-scoping, fail-soft, version-pinning
  • Test suite, setup docs, handoff
  • 1–2 weeks · 50% / 50% terms
  • 14-day support window
Scope this build
Multi-source / platform
scope-dependent$12k–$25k
More than one system, or a broader platform surface — a separate, larger statement of work.
  • Multiple tools / data sources
  • Scoped as its own engagement
  • Reliability retainer available after
Talk through scope
Scoping-gated · the questionnaire sets the tier before any SOW · 50% on signature, 50% on acceptance
Before you reach out

FAQ

Do you work hourly?

No. Work is fixed-scope sprints, scoped and quoted flat after a short call. The entry point is a $1,500 MCP Readiness Audit (written report, no code), credited toward a Sprint if you proceed within 60 days.

How long does a Sprint take?

A typical Integration Sprint ships a working, tested server in about two weeks. Multi-source and self-hosted builds are larger and scoped separately.

What if I already have an MCP server?

The audit covers that too — a read of the existing server for auth-scoping gaps, fail-soft handling, version-pinning, and test coverage, delivered as a written risk log. The MCP spec is still moving; existing servers will need updates as it changes.

What stack do you support?

Python and Node MCP servers over REST, on cloud or self-hosted / on-prem. Auth via API-key, OAuth, or session-scoping. If your tools speak HTTP, they can be wired.

Can I see proof before committing?

That's the whole point. mcp-factory is public with 179 passing tests, and there's a 90-second Loom demo showing the test run, auth-scoping, and fail-soft handling live.

Do you have client testimonials?

Not yet — and I won't fabricate any. Instead I publish tested code you can check yourself and a case study on how I validate systems honestly (including killing ones that don't work). That's the trust signal, in place of borrowed logos.

Get in touch

Got an agent that needs to touch an internal tool?

Tell me the one system you'd wire in first. I'll send a short scoping questionnaire, set the tier, and you'll know exactly what you're getting before anything is signed.

No call required to start — a working repo and a clear scope beat a sales pitch.