Monte Carlo Simulation for Trading Strategy Robustness

The problem with a single equity curve

A backtest produces one equity curve. It looks authoritative: every trade in the exact sequence the market delivered, compounding into one final number and one maximum drawdown. The trouble is that this sequence is an accident. The market handed you those trades in that order, but it could just as easily have handed you the same set of trades in a different order, and the curve would look nothing like the one you are staring at.

This is path dependency. Suppose a strategy takes 200 trades over two years with a positive expectancy. If the three worst losses happen to land early and consecutively, your equity dips hard before the edge has a chance to work, and the drawdown from that cluster could be twice what your backtest reported. The exact same 200 trades, with those three losses scattered across the two years, produce a far shallower drawdown. Same trades, same expectancy, completely different lived experience.

The drawdown number from a single backtest is therefore close to meaningless on its own. It tells you what happened in the one ordering you observed, not what could plausibly happen. The 35% drawdown in your report might be a lucky-low draw from a distribution whose realistic bad case is much worse. You size your account, set your kill switches, and decide whether you can stomach the system based on that single number, and that single number is the wrong thing to anchor on.

What Monte Carlo simulation does

Monte Carlo simulation attacks path dependency directly. Instead of trusting one ordering, it generates thousands of plausible alternative orderings of the same edge and measures the distribution of outcomes across all of them. You stop asking "what happened?" and start asking "what range of things could have happened, and how bad is the bad end of that range?"

The mechanism is resampling. You take your list of trade returns, expressed as R multiples or percentages, and repeatedly draw a new sequence from them. Each sequence is compounded into a fresh equity curve with its own terminal value and its own maximum drawdown. Do this 10,000 times and you have 10,000 equity curves and 10,000 drawdowns to study.

There are two structural choices that change what the simulation tells you. Sampling without replacement simply reshuffles your existing trades, so every path uses each trade exactly once. Sampling with replacement draws trades independently each time, so a path can repeat some trades and omit others, which injects uncertainty about the edge itself, not just its ordering. A third variation randomizes entry timing or the gaps between trades, which matters when your strategy holds positions and is exposed to funding or overnight risk between fills.

The main Monte Carlo methods for trading

Three approaches dominate, and they answer slightly different questions. Choosing the wrong one gives you a number that looks rigorous but measures something you did not intend to measure.

Trade-sequence shuffling
Reorder your actual trades without replacement. Terminal equity is identical on every path because the same trades compound, just in a different order. What changes is the drawdown path. This isolates pure ordering risk and answers: given exactly this set of trades, how deep could the drawdown have been if they arrived in a worse order? It says nothing about whether your sample of trades is representative.
Bootstrap resampling
Draw trades with replacement, building each path from random picks out of the trade pool. Now both drawdown and terminal equity vary, because a path can over-represent your big winners or your big losers. This captures sampling uncertainty: your historical trades are themselves a sample of the edge, and bootstrapping estimates how the edge might behave on a different sample. It is the more honest method for forward-looking risk, at the cost of assuming each trade is interchangeable.
Parametric simulation
Fit a distribution to your trade returns (or to win rate and average win/loss) and generate synthetic trades from it. This smooths over the lumpiness of a small trade sample and lets you stress-test fat tails by widening the distribution. The tradeoff is model risk: if you assume normal returns when your edge produces occasional large losses, the simulation will systematically understate tail drawdowns. Only useful when you genuinely understand the shape of your return distribution.

In practice, shuffling and bootstrapping are the workhorses. Shuffling answers a clean question about the data you have; bootstrapping gives you a more conservative, forward-looking spread. Run both. If their drawdown distributions disagree sharply, that disagreement is itself information about how fragile your trade sample is.

What to measure across simulations

A Monte Carlo run is only as useful as the statistics you pull out of it. The point is not the average path; the average is roughly what your backtest already showed. The point is the tails. These are the numbers worth recording on every run:

Outputs across 10,000 simulated paths

Max drawdown distribution → median, 95th pct, 99th pct

Terminal equity distribution → 5th / 50th / 95th pct

Percent of paths that hit ruin → P(equity ≤ ruin threshold)

CAGR distribution → 5th / 50th / 95th pct

Interpretation anchors

5th-pct CAGR < 0 → realistic bad case loses money

P(ruin) > 0 → position size is too large somewhere

95th-pct DD ≫ backtest DD → backtest was a lucky path

The max drawdown distribution is the headline. The 95th-percentile drawdown is the figure you should size against, because it represents a genuinely plausible bad run rather than an apocalyptic one. The percent of paths that hit ruin converts abstract risk into a single hard number: at your chosen position size, in what fraction of futures does the account fall below the level where you would stop trading? If that number is not zero, you are sized too aggressively. The 5th-percentile CAGR tells you what an unlucky but realistic year of returns looks like, which is far more decision-relevant than the median.

Before you trust any of these numbers, check the edge is real →

Monte Carlo on an overfit strategy just simulates noise in ten thousand outfits. Run our Strategy Overfitting Score first to estimate how much of your backtest edge is curve-fitting before you spend compute reshuffling it.

Check overfitting score

Worked example: the drawdown your backtest hid

Take a real-feeling case. A trend-following crypto strategy backtests over 400 trades with a maximum drawdown of 35% and a CAGR of 28%. On paper, you decide a 35% drawdown is uncomfortable but survivable, so you commit capital at full size.

Now run a bootstrap Monte Carlo with 10,000 paths over those 400 trades:

10,000 bootstrap paths · 400 trades each

Backtest max drawdown: 35%

Median max drawdown: 41%

95th-pct max drawdown: 55%

99th-pct max drawdown: 63%

5th-pct CAGR: 9%

50th-pct CAGR: 27%

95th-pct CAGR: 44%

Paths hitting 50% account loss: ~14%

Read this carefully. Your backtest's 35% was not the typical case; it sat near the optimistic end. The median bad run is 41%, and one realistic-but-unlucky run in twenty reaches 55%. Roughly one path in seven loses half the account at some point. The strategy you signed off on as a "35% drawdown system" is actually a system where a 55% drawdown is a normal feature of the distribution, not a freak event.

This changes two decisions. First, survivability: can you keep trading the system through a 55% drawdown without pulling the plug? Most traders abandon a strategy in exactly that drawdown, locking in the loss right before the edge would have recovered them. Second, sizing: if 55% is intolerable, you halve your position size, which roughly halves the drawdown distribution and pushes the probability of a 50% account loss toward zero, at the cost of a lower CAGR. Monte Carlo turns "feels risky" into a sizing decision you can actually defend.

Limits and ways Monte Carlo misleads

Monte Carlo is not a truth machine. It rests on assumptions that trading data routinely violates, and when those assumptions break, the simulation can give you a confident answer that is quietly wrong.

It assumes trades are independent
Reshuffling treats each trade as an isolated draw, but real strategies cluster. A trend follower wins in runs during trends and bleeds in runs during chop. Shuffling those wins and losses apart destroys the streak structure, which usually makes the simulated drawdowns look milder than what serial losing streaks actually produce.
It assumes stationarity
Resampling assumes the trade distribution is fixed over time. Markets regime-shift: volatility expands and contracts, edges decay, correlations break. A strategy whose returns came mostly from one volatility regime will have its Monte Carlo built on data that may never recur, so the percentiles describe a past world, not the next one.
Reshuffling destroys autocorrelation and regime structure
Any time-ordering information in your returns, momentum in your own equity curve, drawdown-then-recovery dynamics, regime-conditional behavior, is erased the moment you shuffle. If your strategy's risk profile depends on these, the simulation removes the very thing you were trying to measure.
Garbage in, garbage out
This is the big one. Monte Carlo can only reshuffle the trades you feed it. If those trades came from an overfit, curve-fitted backtest with no real edge, the simulation produces ten thousand beautiful variations of noise. It will not warn you that the edge is fake; it may even make the noise look dependable. MC measures path risk, never edge validity.

The honest framing: Monte Carlo answers "given that this edge is real and roughly stationary, how rough could the ride be?" It does not answer "is this edge real?" Confusing the two is the most common and most expensive mistake practitioners make with the technique.

A practical Monte Carlo workflow before going live

Here is the order of operations that keeps Monte Carlo honest and useful, rather than a comforting ritual you run on a strategy you have already decided to trade.

→Step 1: Establish the edge is real with out-of-sample and walk-forward testing before touching Monte Carlo. MC on an unvalidated edge is wasted compute.
→Step 2: Run 10,000 paths minimum, both shuffling and bootstrap with replacement. Confirm tail percentiles are stable across two independent batches.
→Step 3: Set rejection thresholds in advance: P(ruin) must be zero at live size, 95th-pct drawdown must be tolerable, 5th-pct CAGR must be positive.
→Step 4: Size from the 95th-percentile drawdown, not the backtest drawdown. Re-run at the proposed size to confirm ruin probability stays at zero.
→Step 5: Combine with walk-forward to cover what MC cannot: regime change and edge decay over time, which reshuffling structurally ignores.

The two techniques are complementary, not redundant. Walk-forward tests whether the edge survives across time and regimes; Monte Carlo tests whether you can survive the edge's worst plausible ordering. A strategy that passes walk-forward but fails Monte Carlo has a real edge you cannot afford to trade at the size you wanted. A strategy that passes Monte Carlo but fails walk-forward has manageable path risk on an edge that does not exist. You need both green before live capital.

Summary

Your backtest is one ordering of many; its drawdown number is path-dependent and usually optimistic
Monte Carlo resamples your trades into thousands of paths to reveal the full outcome distribution
Shuffling isolates ordering risk; bootstrapping also captures sampling uncertainty in the edge
Measure tails: 95th-pct max drawdown, P(ruin), and 5th/50th/95th-pct CAGR, not the average path
Size from the 95th-percentile drawdown so a realistic bad run does not force you to quit
MC assumes independence and stationarity; it cannot tell you whether the edge is real
Validate the edge with walk-forward and overfitting checks first, then run Monte Carlo on what survives

Frequently asked questions

What does Monte Carlo simulation do for a trading strategy?

It takes your historical trade results and generates thousands of alternative equity curves by reshuffling or resampling those trades. Your real backtest is just one ordering of the trades that actually happened. Monte Carlo shows you the full distribution of outcomes that the same edge could have produced, including drawdowns far deeper than the one you saw historically. You use it to estimate worst-case drawdown, risk of ruin, and the spread of plausible returns before risking capital.

How many Monte Carlo runs should I do?

For stable percentile estimates, run at least 5,000 simulations; 10,000 is a common standard and cheap to compute for trade-level resampling. The tail percentiles (95th, 99th) are the ones that need the most samples to stabilize, so if your 95th-percentile drawdown still jumps around between batches, increase the run count until it converges to within a percentage point or two.

What is the difference between trade shuffling and bootstrap resampling?

Trade-sequence shuffling reorders your actual trades without replacement, so every simulated path uses each trade exactly once and terminal equity is identical across paths; only the drawdown path changes. Bootstrap resampling draws trades with replacement, so a path can repeat good or bad trades and skip others, which changes both terminal equity and drawdown. Shuffling isolates path-ordering risk; bootstrapping also captures sampling uncertainty in the edge itself.

Can Monte Carlo simulation validate an overfit strategy?

No. Monte Carlo only reshuffles the trades you give it. If those trades came from a curve-fitted strategy with no real edge, the simulation will faithfully reproduce thousands of variations of that noise and may even look reassuring. Monte Carlo measures path risk, not whether the edge is real. Test for overfitting first with out-of-sample and walk-forward analysis, then run Monte Carlo on what survives.

What metrics should I reject a strategy on after Monte Carlo?

Set thresholds before you look at the output. Common rejection rules: any path that reaches ruin (a non-zero probability of ruin at your real position size), a 95th-percentile maximum drawdown larger than you can psychologically and financially tolerate, or a 5th-percentile CAGR that is negative. If the realistic bad case would force you to stop trading the system, the strategy fails regardless of how good the median path looks.

Make sure you are simulating an edge, not noise

Monte Carlo on an overfit strategy just reshuffles randomness. Run our free Strategy Overfitting Score first to estimate how much of your backtest performance is real edge versus curve-fitting, then simulate the survivors.

Check overfitting score Book free diagnostic