Free Tool

Strategy Overfitting Score

Is your backtest a genuine edge, or a well-fitted curve? Enter your strategy's parameters and get a reliability score across five dimensions: degrees of freedom, sample size, history coverage, Sharpe plausibility, and IS/OOS retention.

Backtest Parameters

Count every tunable input: period lengths, thresholds, multipliers, etc.

More trades per parameter = more statistical reliability.

months

Length of data used to develop and optimise the strategy.

Annualised Sharpe from the backtest. Be honest — use the unoptimised run.


Optional (improves accuracy)

Sharpe on data the strategy never saw during development. Adds up to 20 points.

How many parameter sets did you try before picking this one? Used for Deflated Sharpe adjustment.

Reliability Score

66/100Acceptable

Reasonable foundation, but gaps exist. Address the low-scoring components before committing capital. Minimum: add walk-forward validation.

Degrees of freedom20.0/30

Borderline — more trades recommended

Sample size8.0/20

Moderate sample — acceptable but more is better

History coverage10.0/15

Reasonable history — ideally includes a volatile period

Sharpe plausibility15.0/15

Sharpe looks plausible after adjusting for number of trials

IS/OOS retention0.0/20

Not provided — strongly recommended to validate with out-of-sample data

IS/OOS retention is worth 20 points and is the single strongest signal of overfitting. Run your strategy on a held-out data period and enter the result above.

Based on the Probability of Backtest Overfitting framework (Bailey et al.) and degrees-of-freedom analysis. This is a directional heuristic, not a formal statistical test. Use alongside walk-forward analysis and out-of-sample testing.

Common questions

What is backtest overfitting?

Overfitting (curve fitting) occurs when a strategy is optimised so heavily on historical data that it captures noise rather than genuine market structure. It looks profitable in backtesting but fails live because the patterns it learned don't persist out-of-sample.

How many trades do I need?

At least 10 trades per optimised parameter, minimum. For a 5-parameter strategy, that's 50 trades — but 200+ is far more credible. Fewer trades mean the Sharpe ratio has wide confidence intervals.

What is the Deflated Sharpe Ratio?

An adjustment to the observed Sharpe that accounts for how many strategy variations you tested. If you tried 100 parameter sets, the best one will look good by chance alone. Providing the number of trials above applies this correction.

My score is low — now what?

Collect more data (extend your backtest period), reduce the number of free parameters, perform walk-forward optimisation, and test strictly out-of-sample. If the score stays low after that, the strategy may not have a genuine edge.

Need a backtest you can actually trust?

We build backtesting frameworks with realistic fills, proper transaction costs, and walk-forward validation built in — not bolted on after the fact.