Backtesting & Performance
backtestout-of-samplewalk-forwardoverfittingdeflated Sharpe
Backtesting & Performance
Methodology for validating strategies without fooling yourself.
Key Points
- A backtest with fewer than 100 trades is statistically insignificant; minimum sample size depends on the strategy’s turnover and Sharpe
- Out-of-sample performance should be 50-70% of in-sample; near-equal performance is a sign of in-sample overfitting
- Walk-forward optimization is the minimum standard; rolling re-fit and re-test windows prevent parameter decay
- The deflated Sharpe ratio adjusts for the number of strategy variants tested; without it, almost any backtested Sharpe can be reproduced by chance
- Transaction costs are the #1 reason backtests fail in production; model slippage, commissions, and borrow fees explicitly
- Regime analysis (bull/bear/sideways) is essential — a strategy that only works in one regime is a strategy that will blow up in the others
Strategies
Walk-Forward Validation
Split data into rolling in-sample / out-of-sample windows. Re-fit parameters on in-sample, evaluate on out-of-sample, then roll forward.
Cross-Asset Robustness
Test the strategy on 3+ uncorrelated instruments. If it only works on one, the edge is unlikely to be real.
Deflated Sharpe Reporting
Always report the deflated Sharpe ratio alongside the raw Sharpe, accounting for the number of trials.
Metrics & Formulas
- Deflated Sharpe: SR × (1 − γ × log(N) / (2T)) where N = # variants, T = # periods
- Profit factor: Gross profit / gross loss
- Expectancy: (win% × avg win) − (loss% × avg loss)
- Calmar ratio: CAGR / |max drawdown|
Tools & Resources
- vectorbt / backtesting.py — Python backtesting frameworks
- Zipline / Lean (QuantConnect) — Full-featured open-source engines
- pyfolio — Tear-sheet and risk analytics (now part of QuantStats)
- QuantStats — Modern alternative to pyfolio