Backtesting & Performance

Methodology for validating strategies without fooling yourself.

Key Points

A backtest with fewer than 100 trades is statistically insignificant; minimum sample size depends on the strategy’s turnover and Sharpe
Out-of-sample performance should be 50-70% of in-sample; near-equal performance is a sign of in-sample overfitting
Walk-forward optimization is the minimum standard; rolling re-fit and re-test windows prevent parameter decay
The deflated Sharpe ratio adjusts for the number of strategy variants tested; without it, almost any backtested Sharpe can be reproduced by chance
Transaction costs are the #1 reason backtests fail in production; model slippage, commissions, and borrow fees explicitly
Regime analysis (bull/bear/sideways) is essential — a strategy that only works in one regime is a strategy that will blow up in the others

Split data into rolling in-sample / out-of-sample windows. Re-fit parameters on in-sample, evaluate on out-of-sample, then roll forward.

Test the strategy on 3+ uncorrelated instruments. If it only works on one, the edge is unlikely to be real.

Always report the deflated Sharpe ratio alongside the raw Sharpe, accounting for the number of trials.

Deflated Sharpe: SR × (1 − γ × log(N) / (2T)) where N = # variants, T = # periods
Profit factor: Gross profit / gross loss
Expectancy: (win% × avg win) − (loss% × avg loss)
Calmar ratio: CAGR / |max drawdown|