Strategy Overfitting Score
Is your backtest a genuine edge, or a well-fitted curve? Enter your strategy's parameters and get a reliability score across five dimensions: degrees of freedom, sample size, history coverage, Sharpe plausibility, and IS/OOS retention.
Backtest Parameters
Count every tunable input: period lengths, thresholds, multipliers, etc.
More trades per parameter = more statistical reliability.
Length of data used to develop and optimise the strategy.
Annualised Sharpe from the backtest. Be honest — use the unoptimised run.
Optional (improves accuracy)
Sharpe on data the strategy never saw during development. Adds up to 20 points.
How many parameter sets did you try before picking this one? Used for Deflated Sharpe adjustment.
Reliability Score
Reasonable foundation, but gaps exist. Address the low-scoring components before committing capital. Minimum: add walk-forward validation.
Borderline — more trades recommended
Moderate sample — acceptable but more is better
Reasonable history — ideally includes a volatile period
Sharpe looks plausible after adjusting for number of trials
Not provided — strongly recommended to validate with out-of-sample data
IS/OOS retention is worth 20 points and is the single strongest signal of overfitting. Run your strategy on a held-out data period and enter the result above.
Based on the Probability of Backtest Overfitting framework (Bailey et al.) and degrees-of-freedom analysis. This is a directional heuristic, not a formal statistical test. Use alongside walk-forward analysis and out-of-sample testing.
Common questions
What is backtest overfitting?
Overfitting (curve fitting) occurs when a strategy is optimised so heavily on historical data that it captures noise rather than genuine market structure. It looks profitable in backtesting but fails live because the patterns it learned don't persist out-of-sample.
How many trades do I need?
At least 10 trades per optimised parameter, minimum. For a 5-parameter strategy, that's 50 trades — but 200+ is far more credible. Fewer trades mean the Sharpe ratio has wide confidence intervals.
What is the Deflated Sharpe Ratio?
An adjustment to the observed Sharpe that accounts for how many strategy variations you tested. If you tried 100 parameter sets, the best one will look good by chance alone. Providing the number of trials above applies this correction.
My score is low — now what?
Collect more data (extend your backtest period), reduce the number of free parameters, perform walk-forward optimisation, and test strictly out-of-sample. If the score stays low after that, the strategy may not have a genuine edge.
Need a backtest you can actually trust?
We build backtesting frameworks with realistic fills, proper transaction costs, and walk-forward validation built in — not bolted on after the fact.