Sharpe vs Sortino vs Calmar: Which Risk-Adjusted Return Metric to Trust

Why raw returns are useless without risk context

Two strategies both return 40% in a year. The first grinds out a steady 3% a month with the worst losing month at −2%. The second made all 40% in two explosive weeks and spent the rest of the year underwater by 35% before recovering. The headline number is identical. The experience of trading them, and the capital required to survive them, could not be more different.

Return alone tells you how far you traveled, not how violently the ride threw you around or how close you came to a margin call. Risk-adjusted metrics put return in a denominator of risk so you can compare strategies on the same scale. The catch: there is no single agreed-upon definition of "risk." Standard deviation, downside deviation, and maximum drawdown each capture something different, and each produces a different ranking of the same set of strategies. Choosing a metric is not a formality. It is an implicit statement about what you are afraid of.

The Sharpe ratio

The Sharpe ratio, introduced by William F. Sharpe in 1966, is the default risk-adjusted measure in the industry. It divides the return earned above the risk-free rate by the standard deviation of returns. In plain terms: how much excess return are you getting per unit of total volatility.

Sharpe ratio

Sharpe = (Rₚ − Rₑ) ÷ σₚ

where

Rₚ = portfolio return for the period

Rₑ = risk-free rate for the period

σₚ = standard deviation of period returns

Annualized = Sharpeₚₑᵣ × √N

(N = periods per year: 252 daily, 12 monthly)

Worked example. A strategy returns an average of 1.2% per month with a monthly standard deviation of 2.8%. The risk-free rate is 0.35% per month (about 4.2% annualized).

Monthly excess = 1.2% − 0.35% = 0.85%

Monthly Sharpe = 0.85% ÷ 2.8% = 0.304

Annualized = 0.304 × √12 = 1.05

What it rewards: consistency. A strategy with small, regular returns and tight dispersion scores well. This is exactly what allocators want, which is why Sharpe became the lingua franca of the industry.

Where it misleads you: first, it penalizes upside volatility just as hard as downside. A month where you make +12% widens your standard deviation and lowers your Sharpe, even though no trader on earth complains about an outsized winner. Second, it assumes returns are roughly normally distributed; strategies with fat tails or heavy skew (selling options, carry trades, mean reversion into trends) report a flattering Sharpe right up until the tail event arrives. Third, it is acutely sensitive to sampling frequency and window. Compute it from daily data versus monthly data and you get different numbers; shift the start date by three months and a Sharpe of 1.8 can become 0.9.

The Sortino ratio

The Sortino ratio fixes the first complaint about Sharpe: it stops punishing you for making money. Instead of total standard deviation, the denominator is downside deviation, the volatility of only those returns that fall below a target (usually zero, or the risk-free rate). Upside swings no longer count against you.

Sortino ratio

Sortino = (Rₚ − T) ÷ DD

downside deviation

DD = √( Σ min(Rᵢ − T, 0)² ÷ N )

T = target / minimum acceptable return (often 0)

Worked example. Take twelve monthly returns: eight months at +3%, and four losing months at −1%, −4%, −2%, and −3%. The average monthly return is +1.5%. With a target of 0%, only the four negative months feed the downside deviation.

Sum of squared shortfalls (below 0):

(−1)² + (−4)² + (−2)² + (−3)² = 1 + 16 + 4 + 9 = 30

DD = √(30 ÷ 12) = √2.5 = 1.58%

Monthly Sortino = 1.5% ÷ 1.58% = 0.949

Annualized = 0.949 × √12 = 3.29

Note that downside deviation divides by the full count N, not just the number of losing months, which keeps the metric comparable across strategies with different hit rates. Sortino is the better lens whenever your return distribution is asymmetric on purpose: trend-following systems that take many small losses and a few enormous wins, breakout strategies, or any approach where the big upside moves are the entire point. Sharpe would dock those strategies for their best months; Sortino correctly ignores them.

The Calmar ratio

The Calmar ratio (named after Terry Young's newsletter, California Managed Accounts Reports) abandons volatility entirely and asks the question every leveraged trader actually cares about: how much did I make relative to the worst peak-to-trough loss I had to sit through?

Calmar ratio

Calmar = Annualized return ÷ |Maximum drawdown|

conventionally measured over a trailing 36-month window

Worked example. A strategy returns 32% annualized. Over the measurement window its equity curve peaked, then fell 18% before making a new high.

Annualized return = 32%

Maximum drawdown = 18%

Calmar = 32 ÷ 18 = 1.78

Drawdown-based metrics matter for two reasons that standard-deviation metrics miss. First, survivability. A strategy does not blow up because its variance was high; it blows up because the cumulative loss from a high-water mark exceeded what the account, or the trader's nerve, could absorb. Maximum drawdown measures that directly. Second, leverage. If you run 5x leverage, a 20% strategy drawdown becomes a 100% account loss. Sharpe is leverage-invariant (scaling returns and volatility by the same factor leaves it unchanged), so it tells you nothing about whether a given leverage level is survivable. Calmar does, because drawdown scales with leverage while Calmar's usefulness as a ruin gauge does not wash out. The weakness of Calmar is that maximum drawdown is a single worst-case point estimate, extremely sensitive to the window and to one bad event, and it says nothing about how often or how quickly drawdowns occur.

The same three strategies, scored three ways

Here is why the choice of metric is not academic. Below are three strategies with roughly comparable annual returns, scored by all three ratios. Watch how the ranking reshuffles depending on which one you trust.

StrategySharpeSortinoCalmar

A: Mean reversion1.91.70.8

B: Trend following0.81.91.1

C: Low-vol carry1.41.32.4

Strategy A wins on Sharpe: its returns are tight and symmetric, exactly what total standard deviation rewards. But it has fat left tails, so a sharp adverse move produces a deep drawdown and its Calmar collapses to 0.8. Strategy B looks mediocre on Sharpe (its huge winning months inflate the standard deviation) but is the Sortino champion because almost all of that volatility is upside. Strategy C has the shallowest worst-case drawdown, so it dominates on Calmar even though its raw return profile is unremarkable. Optimize for Sharpe and you ship A. Optimize for Calmar and you ship C. Same three strategies, opposite decisions.

A high Sharpe means nothing if the backtest is overfit →

All three ratios are computed on a finite history, and any number can be tuned upward by curve-fitting parameters to the past. Before you trust a 2.5 Sharpe, run the equity curve through our Strategy Overfitting Score to see how well it holds up.

Check overfitting risk

How these numbers get gamed and inflated

Every risk-adjusted metric is a single number distilled from a return series, and anyone presenting one controls how that series was built. The most common ways the figures get inflated, deliberately or by carelessness:

Cherry-picked windows
Start the measurement right after the worst drawdown and end at a peak. Shifting the window by a few months can move a Sharpe from 0.9 to 1.8. Always ask for the full available history, not a hand-picked slice.
Monthly vs daily sampling
Sampling monthly smooths over intramonth volatility and drawdowns, raising Sharpe and Sortino. A strategy that looks like 1.6 on monthly data may be 1.0 on daily data. Sampling frequency must always be disclosed alongside the number.
Ignoring fees, spread, and slippage
Gross-of-cost ratios are fiction for any strategy that trades often. A high-turnover system with a 2.0 gross Sharpe can drop below 1.0 once realistic commissions, spread, and funding are subtracted from each fill.
Survivorship bias
Reporting the Sharpe of the strategies that survived while quietly dropping the ones that blew up inflates the average. The same applies to backtests on asset universes that exclude delisted or bankrupt names.
Hidden tail risk
Selling out-of-the-money options or running uncapped carry produces a beautiful Sharpe for years because losses are rare. The metric simply has not sampled the tail yet. A high ratio over a short, calm period is not evidence of skill.

Which one to use, and realistic targets

There is no universally correct metric. There is a correct metric for a given strategy and a given question. A practical decision guide:

→Sortino for asymmetric strategies, trend following, breakouts, anything where large upside moves are the whole point and you do not want them penalized.
→Calmar for leveraged or path-dependent strategies where survivability and worst-case drawdown decide whether you get liquidated.
→Sharpe for benchmarking and apples-to-apples comparison across managers, because it is the universal standard everyone reports and allocators expect.

The disciplined move is to report all three. If a strategy scores well on every lens, you have genuine confidence. If the three disagree sharply, that disagreement itself is the signal: it tells you the return distribution is skewed, fat-tailed, or drawdown-prone in a way no single number captures.

Realistic targets for a live, cost-adjusted strategy, not a backtest:

Sharpe > 1.0 = solid · 1.5-2.0 = strong · > 3.0 = suspect

Sortino typically runs ~1.3-1.6× the Sharpe for a healthy strategy

Calmar > 0.5 = acceptable · > 1.0 = good · > 3.0 = rare / likely short window

Treat any backtested ratio as the optimistic ceiling, never the expectation. Live results almost always come in lower once real costs, capacity limits, and regime shifts do their work. The metric is a lens, not a verdict, and the most expensive mistake in quant trading is mistaking a well-fit number for a durable edge.

Summary

Raw return tells you nothing without a measure of the risk taken to earn it
Sharpe = excess return ÷ total standard deviation; great for benchmarking, but penalizes upside and assumes normal returns
Sortino = excess return ÷ downside deviation; the right lens for asymmetric, upside-skewed strategies
Calmar = annualized return ÷ max drawdown; the survivability metric for leveraged, path-dependent systems
The same strategy can rank #1 on one metric and last on another, so report all three
Disclose sampling frequency, window, and costs, or the number is meaningless and easily gamed

Frequently asked questions

What is a good Sharpe ratio for a trading strategy?

For a live, fee-and-slippage-adjusted strategy, a Sharpe of 1.0 is solid, 1.5 to 2.0 is strong, and anything above 3.0 should be treated as suspicious until proven out of sample. Backtest Sharpe values are routinely inflated by overfitting, so discount them heavily. The number is only meaningful when you state the sampling frequency and the holding period it was computed over.

What is the difference between Sharpe and Sortino?

Both divide excess return by a measure of volatility, but Sharpe uses total standard deviation (penalizing upside and downside moves equally) while Sortino uses only downside deviation (volatility of returns below a target, usually zero or the risk-free rate). Sortino rewards strategies with large positive surprises and small losses; Sharpe treats a big winning month as just as risky as a big losing one.

Why is the Calmar ratio useful for leveraged strategies?

Calmar divides annualized return by maximum drawdown, so it directly measures return per unit of worst-case peak-to-trough loss. For leveraged or path-dependent strategies, the maximum drawdown is what triggers margin calls and forced liquidation, not the standard deviation. A strategy with a great Sharpe can still have a drawdown deep enough to wipe out a leveraged account, which Calmar exposes.

How do you annualize the Sharpe ratio?

Multiply the per-period Sharpe by the square root of the number of periods in a year. For daily returns that is the square root of 252 (about 15.87); for monthly returns it is the square root of 12 (about 3.46). This scaling assumes returns are independent and identically distributed, which is rarely exactly true, so an annualized Sharpe computed from daily data and one from monthly data are not directly comparable.

Can these risk-adjusted metrics be gamed?

Easily. Picking a favorable start and end date, sampling monthly instead of daily to hide intramonth volatility, omitting trading fees and slippage, and ignoring strategies that blew up (survivorship bias) all inflate every one of these ratios. Selling out-of-the-money options produces a high Sharpe for years until the tail event arrives. Always demand the raw equity curve and the exact calculation methodology before trusting a single number.

A high Sharpe is worthless if the strategy is overfit

Before you trust any risk-adjusted number on a backtest, check how well it holds up. Run your equity curve through our free Strategy Overfitting Score to see whether the edge is real or just curve-fit to the past.

Check overfitting score Book free diagnostic