An honest validation harness — I kill what doesn't clear costs

// reliability story · rigor, not returns · paper-only, nothing live

forward-testing cost-reality gate fail-soft real-time data 186 tests (options-bot)

Up front, because honesty is the whole point of this page: this is not a trading-performance story. None of these systems are profitable and none are running live — the trading fleet is frozen on purpose. What I'm selling here is the engineering and the discipline to tell the truth about results, which is exactly what you want in someone wiring an agent into your production systems.

The problem

It is easy to build something that looks like it works. It is much harder to build the scaffolding that tells you honestly whether it actually works — and harder still to act on that answer when it's "no." Most side-project dashboards are built to flatter the builder. I wanted the opposite: infrastructure that would catch a system passing its own happy-path checks while quietly failing the real test.

What I built

A validation harness around a fleet of experimental strategies: every system runs paper-only, forward-tested against live conditions, with a hard cost-reality gate — a strategy has to clear its real frictions (fees, spread, slippage) over a meaningful sample, not just print a positive number on a backtest. When a system doesn't clear that bar, it gets retired, and the result is recorded as the answer rather than buried.

Evidence: a strategy I killed

One strategy ("sage-bot") was taken to a real verdict and retired:

sage-bot: -$735.66 over 1,002 paper trades → does not clear costs → PARKED / retired.
That number is the deliverable. The harness did its job: it produced a trustworthy "no" on a meaningful sample instead of a flattering "maybe."

An engineering vignette: real-time data done right

The same fleet is where I solve genuinely hard integration problems. The options bot needed live option quotes from a tastytrade DXLink stream. The naive approach opened a fresh streamer for every contract lookup — a ~16-second handshake flood that timed out before the quote arrived and silently fell back to a rate-limited free source. The data was wrong and the cause was invisible.

The fix is the kind of work a Sprint is made of:

One persistent streamer — a single daemon thread owns one asyncio loop, enters the DXLink stream once, and listens forever, writing the latest bid/ask to a thread-safe per-symbol cache.
Bounded waits — only a freshly-subscribed symbol pays a short wait for its first quote; everything after is a cache read.
Fail-soft + self-healing — any error returns a clean None to the caller (never a crash); the handler backs off exponentially (1s → 30s) and auto-reconnects, re-subscribing known symbols so the cache heals itself.
Honest paper mode — in paper mode an order is validated and logged but never submitted live; auth failures are caught, logged, and exit cleanly.
Tested — 186 tests cover the streamer, signals, P&L accounting, market hours, and the data-source fallbacks.

What it shows about how I work

Two things you want in a contractor touching your production stack: I build real-time integrations that fail soft and heal themselves instead of lying to you, and I report results straight — including the ones that say "kill it." A pretty dashboard is easy. Infrastructure that's willing to tell you bad news, and engineering that holds up when the upstream flakes, is the actual job.

Book a scoping call ← All case studies