From the June 01, 2011 issue of Futures Magazine • Subscribe!

How to backtest trading systems and avoid curve fitting

To judge how well a given trading system should work in the future, we backtest it on past market data. Backtesting applies a set of trading rules to historical data to estimate how those rules would have performed if we actually had traded them. Good hypothetical historical results do not guarantee that a set of rules will work well in the future. However, poor hypothetical historical results almost certainly mean a system should not be traded in real time.

The perceived value of backtesting is rooted in the belief that historical tendencies repeat. Traders have been testing strategies on historical data for generations. However, the practice became popular with the advent of personal computers and purpose-built system-testing software, such as System Writer, which evolved into TradeStation. This software and a database of historical data allowed those without a code-writing background to test trading system ideas. The broader understanding and acceptance of trading systems, as well as the frustration many encountered when trying to build trading systems on their own, helped the market of third-party systems flourish throughout the 1990s.

Futures Truth is an independent company that has tracked commercially available trading systems since the 1980s. Currently, it tracks more than 500 systems. Futures Truth tests trading systems in real time, not on historical data. This prevents the modification of rules over time and better simulates rule execution in actual market conditions, such as periods of high volatility. According to Futures Truth, only about 45% of the tracked systems are profitable in the long-term, while only 20% have exhibited a good risk/reward ratio. However, these numbers likely are better than the broader population’s because only those vendors truly confident in their logic turn it over to Futures Truth for real-time analysis and public critique.

So many systems fail because they lack a valid premise. Instead, the entry and exit parameters are derived from data mining. Data mining simply scans historical data for rules that would have worked in the past. Often, such rules are fit precisely to the past and have no hope of working any better than random on unseen data. Instead, system development should start with a theory that can be tested, analyzed and fine-tuned for application. This concept also implies a different perspective on system testing itself: The goal of backtesting is not to produce a collection of hypothetical profit and loss statistics. It is to test the validity of the theory and the accuracy of the rules in capturing the premise.

System testing is a multifaceted process from the data, to the time scale, to order entry assumptions, to contract specifics and risk control. Failing at any of these can ruin an otherwise valid test — or, manipulating them can generate results that are far superior than what we would achieve in real time. You need to do it right if you hope to validate — or when appropriate, invalidate — your system.

Tools of the trade

There are two elements to backtesting: The proper tools — software and data — and a scientific method to develop systems using those tools. Let’s start by looking at the tools of the trade.

Many options are available for testing your ideas. They differ in the ease of turning ideas into code and in how they handle the details, which can have a major impact on the results. For example, if a system enters on a limit order, some software records a fill if that price is touched. However, there is hardly a guarantee such an order would have been filled in real trading, nor is there a guarantee it won’t be. Entering on stops guarantees an entry, but not a price.

Another issue is recording real prices. While most professionally developed software no longer has this issue, it is still a concern for those who manually test systems in spreadsheets, such as Microsoft Excel. For example, if a system buys on a stop equal to the close plus one-third of the average range over the last three periods, and if the average range is 10, then we are buying at the close plus 3.333. If we are trading the E-mini S&P 500, it trades in 0.25 tick sizes. This means the entry differential must round up to 3.50. A beginning trader may not realize this if manually crunching numbers, and it wasn’t too long ago that many professional programs made the same mistake. Over time, such an error could add up to a sizable discrepancy.

In the big picture, however, such procedural details are minor. The big issue is data.

Comments
comments powered by Disqus