Backtesting Guide — Principles & MarketBook Replay
This backtesting guide explains why backtesting matters, the core principles you must adopt, common dead-ends teams face, and a reproducible MarketBook replay approach for Betfair-style backtests. Follow the steps and use the included scripts to validate execution fidelity before you pilot live.
Why backtesting matters
Backtesting turns an idea into an experiment you can measure and reason about. Therefore, you must treat backtests as controlled experiments: state assumptions, run reproducible tests, measure sensitivity, and validate execution fidelity. When you follow that approach you reduce the chance that a promising paper result surprises you in production.
Core principles
Reproducibility
Record every run: data version, git SHA, parameter values and random seeds. Then, save a run manifest so you can reproduce results and audit differences later.
Match fidelity to decision
If you only tune indicators, minute or hourly bars often suffice. However, if you decide on execution tactics (limit vs market, queueing), use ladder-level MarketBook snapshots. Increase fidelity only where it impacts decisions.
Explicit execution model
Simulate fills, partial fills, latency, and fees in code. Next, write tests that check simple deterministic scenarios so you know your simulator behaves as expected.
Robustness, not just returns
Perturb slippage, latency and parameters. Then, build a robustness matrix that shows how metrics drift. Prefer strategies that remain stable under realistic perturbations.
Common dead-ends and fixes
Optimistic fills
Many guides assume fills at bar-close or midpoint. That assumption inflates returns. Instead, walk the ladder and consume volume at price levels to compute realistic fills.
Overfitting via mass parameter search
Large grid searches on full datasets find noise. To avoid that, restrict hyperparameter search to training windows and validate on out-of-sample periods (nested validation).
Missing operational gates
Backtests without reconciliation, alerts, and a kill-switch can still fail in production. Therefore, include operational checks as deployment gates.
Data & storage
Collect MarketBook snapshots with publishTime (UTC), marketId, runner list, availableToBack and availableToLay arrays, and totalMatched. Store raw snapshots immutably (JSONL or versioned S3) and keep checksums and capture script versions.
{
"marketId":"1.23456789",
"publishTime":"2025-01-01T12:00:00.123Z",
"runners":[
{"selectionId":12345,
"availableToBack":[{"price":2.0,"size":100.0}],
"availableToLay":[{"price":2.02,"size":80.0}],
"totalMatched":1234.5}
]
}
Validate each snapshot against a JSON Schema before replay. The included ingest script performs schema checks and writes normalized JSONL for deterministic replays.
Execution modelling
Implement these behaviors in your simulator:
- Market orders: walk opposing ladder and consume volume until filled or exhausted.
- Limit orders: attempt immediate match at price then rest remaining stake into a virtual book.
- Partial fills: support remaining quantities and match them on subsequent snapshots.
- Latency: delay submissions by a configurable period to simulate processing and network delay.
- Fees & settlement: apply commission and handle SP/void rules where applicable.
# simplified consume ladder
def consume_ladder(ladder, qty):
fills, remaining = [], qty
for price, avail in ladder: # ladder sorted best-first
if remaining <= 0: break
fill = min(remaining, avail)
fills.append((price, fill))
remaining -= fill
return fills
Validation & walk-forward
Use walk-forward cross-validation: train on past window → test on next window → roll forward. Wrap hyperparameter searches inside the train window and collect test-period metrics across folds. Then, prefer strategies that perform consistently across many windows and perturbations.
Practical MarketBook replay pattern
Run this deterministic sequence to replay and simulate fills:
- Ingest and validate snapshots (JSON Schema).
- Sort snapshots by publishTime (UTC).
- For each snapshot: match resting limit orders, then process strategy signals and submit orders at that timestamp.
- Record every fill with order_id, timestamp, price and size.
- At settlement apply commission and compute PnL; persist run manifest (git SHA, snapshot list, params).
Downloadable example scripts (replayer.py & ingest_marketbook.py) accompany this post for a starting implementation you can run locally.
TradingView — when to use it
Use TradingView to iterate signals quickly and visualise trades. Then, export signals and run them against MarketBook replays to evaluate execution. TradingView helps early-stage iteration; MarketBook replay tests execution realism.
Resources & downloads
Included resources (download from the resources page or request a ZIP):
- replayer.py — deterministic MarketBook replay simulator (example).
- ingest_marketbook.py — JSON Schema validator and normalizer.
- marketbook_schema.json — JSON Schema for snapshots.
- BACKTESTING-CHECKLIST-README.md — one-page checklist to gate deployments.
Checklist & deployment gates
- Archive raw snapshots and record checksums.
- Validate snapshots with JSON Schema.
- Version simulator and record git SHA in run manifest.
- Run walk-forward and nested validation.
- Stress-test slippage, latency and data gaps.
- Test kill-switch and reconciliation in staging before pilot.
