How to Evaluate an Investment Strategy?

The Statistical Way

Investment management is no longer a game exclusive to the rich. As investments become more accessible, more people are taking control of their investment decisions. Therefore, evaluating investment strategies becomes crucial. We live in a world obsessed with absolute returns. When people talk about investment giants, they often mention their annualized returns. However, the risks associated with their strategies are rarely discussed, which is a basic error. Most people with some professional knowledge at least use the Sharpe ratio, a risk-adjusted return measure, to evaluate investment strategies. Another common obsession is with the "two sigma" standard deviation. Why is it called two sigma? Two sigma means having 95% confidence that the strategy's positive returns are real and not randomly generated. What is its significance? Finding a "two sigma" means finding a new source of investment returns, though this statement is not entirely accurate.

While the two sigma standard is not strict enough (we'll discuss this later), standard deviation is indeed more objective than the Sharpe ratio because it considers more comprehensive information. A simple way to calculate standard deviation is:

Standard Deviation=Sharpe Ratio×Years​^(1/2)

For example:

  • If a strategy's Sharpe ratio is 1 and tested for four years, the standard deviation is 2.

  • If a strategy's Sharpe ratio is 2 and tested for four years, the standard deviation is 4.

  • If a strategy's Sharpe ratio is 1 and tested for nine years, the standard deviation is 3.

This evaluation method aligns with criticisms of the Sharpe ratio by some investment giants. For example, Cliff Asness often complains that people evaluate strategies based on ten years of data, only looking at the Sharpe ratio. Thus, the standard deviation method appears more objective. What is the standard deviation of common strategies? Let's take four strategies: trend, value, growth, and market index.

Over an 87-year test period, the standard deviations for the four strategies are 6.16, 3.82, 3.45, and 3.82, respectively. These are much higher than 2, suggesting that achieving "two sigma" is not particularly difficult. The four strategies seem impressive, indicating that long-term returns are positive using these investment approaches. This might seem trivial, but it's significant for some, especially those who believe in the "seven losses, two break even, one gain" theory about stock market investments. Even simple long-term index holding has a minimal chance of loss.

Consider another more realistic scenario: a strategy with a 140% return over ten years. Should we regard it as an effective strategy due to its long-term upward trend? The answer depends entirely on the evaluation method.

What is a reasonable measure for evaluating an investment strategy?

There's no one-size-fits-all answer. Historically, "three sigma" should be a more reasonable standard than "two sigma." This might sound silly, as it merely raises the selection criteria, but the idea that "two sigma is unacceptable" doesn't hold under this logic. The difference between 95% confidence and 99% confidence is significant, especially when considering "multiple testing."

A "two sigma" corresponds to less than 95% confidence depending on the number of tests. The more tests, the lower the actual confidence level.

For example, consider a national dice-throwing contest with a billion participants. After ten rounds, there might be one person who rolls ten consecutive sixes. The contest organizer might present this person as having divine dice-throwing skills, using a binomial distribution to calculate a p-value and claiming near 100% confidence in this person's skill. The problem is evident: the investment strategy is similar. If only successful strategies are retained from many, evaluating the so-called successful strategy by "two sigma" is too lenient.

Even "three sigma" is more qualitative than quantitative, considering most strategy developers test multiple strategies. With fifty strategies, if one reaches "three sigma," we have 95% confidence it is effective. With hundreds or thousands of simulations, three or even four sigma might still be insufficient for definitive conclusions.

Previously, the example strategy was the best of 200 randomly simulated strategies. Despite its standard deviation calculation of 3, it fails to meet the criteria in the face of 200 simulations, with less than 5% statistical confidence, indicating it was randomly generated.

For a 95% confidence level in a strategy out of 10,000 simulations, a standard deviation of around 4.5 is required.

While I might seem against data-driven approaches, data-driven strategies with high standard deviations after multiple testing adjustments can be effective, potentially revealing new market anomalies. For instance, Cliff Asness discussed trend investing's excess returns despite not knowing why they exist, and Andrew Ang discussed low-risk investments' anomalies without clear explanations.

For theory-driven investment strategies, statistical requirements can be lower since they are based on economic realities. Examples include Michael Burry's precise calculation of the credit market's limits and Fischer Black's recognition of Valueline market index mispricing, leading to classic statistical arbitrage strategies.

Standard deviation as an evaluation method is merely a reference and not absolute. It doesn't consider most strategies' tail risks, exaggerates the attractiveness of spread and short volatility strategies, and ignores the broader significance of strategies, such as trend strategies' diversification effect on value strategies.

The evaluation process for strategies and the selection of investment managers is complex, not based on a few numerical values. Below is a reference for readers interested in investment strategies. Considering the simplest "multiple testing" concept can lead to more reasonable strategy selection.

References

  • Ang, A. (2014). Asset management: A systematic approach to factor investing. Oxford University Press.

  • Asness, C. (2011). Momentum in Japan: The exception that proves the rule. The Journal of Portfolio Management, 37(4), 67-75.

  • Asness, C.S., Moskowitz, T.J., & Pedersen, L.H. (2013). Value and momentum everywhere. The Journal of Finance, 68(3), 929-985.

  • Gray, W.R., & Vogel, J.R. (2016). Quantitative Momentum: A Practitioner's Guide to Building a Momentum-Based Stock Selection System. John Wiley & Sons.

  • Harvey, C.R., & Liu, Y. (2014). Evaluating trading strategies. The Journal of Portfolio Management, 40(5), 108-118.

  • Jaye, N. (2014). The Art of Knowing Nothing Brilliantly. CFA Institute Magazine, 25(3), 34-37.

  • Pedersen, L.H. (2015). Efficiently inefficient: how smart money invests and market prices are determined. Princeton University Press.

  • Narang, R.K. (2013). Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading (Vol. 883). John Wiley & Sons.

  • Schwager, J.D. (2012). Hedge fund market wizards: How winning traders win. John Wiley & Sons.

Disclaimer: The data and information mentioned are from third-party sources, and accuracy is not guaranteed. This article shares information and views, not professional investment advice. Consult professional advice before making investment decisions.

This article was originally written in Chinese and posted on my WeChat platform in 2017. The Chinese link can be found here