To judge how well a given trading system should work in the future, we backtest it on past market data. Backtesting applies a set of trading rules to historical data to estimate how those rules would have performed if we actually had traded them. Good hypothetical historical results do not guarantee that a set of rules will work well in the future. However, poor hypothetical historical results almost certainly mean a system should not be traded in real time.
The perceived value of backtesting is rooted in the belief that historical tendencies repeat. Traders have been testing strategies on historical data for generations. However, the practice became popular with the advent of personal computers and purpose-built system-testing software, such as System Writer, which evolved into TradeStation. This software and a database of historical data allowed those without a code-writing background to test trading system ideas. The broader understanding and acceptance of trading systems, as well as the frustration many encountered when trying to build trading systems on their own, helped the market of third-party systems flourish throughout the 1990s.
Futures Truth is an independent company that has tracked commercially available trading systems since the 1980s. Currently, it tracks more than 500 systems. Futures Truth tests trading systems in real time, not on historical data. This prevents the modification of rules over time and better simulates rule execution in actual market conditions, such as periods of high volatility. According to Futures Truth, only about 45% of the tracked systems are profitable in the long-term, while only 20% have exhibited a good risk/reward ratio. However, these numbers likely are better than the broader population’s because only those vendors truly confident in their logic turn it over to Futures Truth for real-time analysis and public critique.
So many systems fail because they lack a valid premise. Instead, the entry and exit parameters are derived from data mining. Data mining simply scans historical data for rules that would have worked in the past. Often, such rules are fit precisely to the past and have no hope of working any better than random on unseen data. Instead, system development should start with a theory that can be tested, analyzed and fine-tuned for application. This concept also implies a different perspective on system testing itself: The goal of backtesting is not to produce a collection of hypothetical profit and loss statistics. It is to test the validity of the theory and the accuracy of the rules in capturing the premise.
System testing is a multifaceted process from the data, to the time scale, to order entry assumptions, to contract specifics and risk control. Failing at any of these can ruin an otherwise valid test — or, manipulating them can generate results that are far superior than what we would achieve in real time. You need to do it right if you hope to validate — or when appropriate, invalidate — your system.
Tools of the trade
There are two elements to backtesting: The proper tools — software and data — and a scientific method to develop systems using those tools. Let’s start by looking at the tools of the trade.
Many options are available for testing your ideas. They differ in the ease of turning ideas into code and in how they handle the details, which can have a major impact on the results. For example, if a system enters on a limit order, some software records a fill if that price is touched. However, there is hardly a guarantee such an order would have been filled in real trading, nor is there a guarantee it won’t be. Entering on stops guarantees an entry, but not a price.
Another issue is recording real prices. While most professionally developed software no longer has this issue, it is still a concern for those who manually test systems in spreadsheets, such as Microsoft Excel. For example, if a system buys on a stop equal to the close plus one-third of the average range over the last three periods, and if the average range is 10, then we are buying at the close plus 3.333. If we are trading the E-mini S&P 500, it trades in 0.25 tick sizes. This means the entry differential must round up to 3.50. A beginning trader may not realize this if manually crunching numbers, and it wasn’t too long ago that many professional programs made the same mistake. Over time, such an error could add up to a sizable discrepancy.
In the big picture, however, such procedural details are minor. The big issue is data.
Understanding market data
The first and most important issue we need to address when we backtest trading systems is data. Futures contracts face an additional consideration because they are finite instruments. Backtesting on futures data over time requires contracts to be spliced together to create longer data series. This is necessary for analysis as well as for those trades that persist across the lifespan of two distinct contracts.
The process is made more complicated because of the differential that exists between two adjacent contracts. This differential is because of interest rates and other cost of carry factors that affect differences between the physical commodities and their derivative futures contracts. This gap means you can’t just splice the contract data together. One, the other or both must be adjusted to eliminate the gap.
In a gap-adjusted continuous contract, every price gap caused by a contract roll is measured and removed. This difference can be applied in one of two ways: It can start at the beginning and work its way forward, or it can start at the end and work its way backward. The second method, called back-adjusting, is most common as it leaves the active contract unadjusted so its prices reflect current reality.
For example, on the September E-mini S&P 500 contract, we closed at 1300.40 on Sept. 11, 2006, while the December contract closed at 1311.60 (see "Market shift," below). This difference of 11.60 points is the rollover gap between the two contracts. Now, this roll gap is up, as the difference is positive. If we wish to remove this gap, we need to make adjustments in all the previous prices upward by 11.60 points to effectively close the gap. If the difference were negative, then the roll gap would have been down, and adjusting all previous prices downward by 11.60 points would have closed it. An important point to note is that this process (back-adjustment of prices) is ongoing, and gets repeated each and every time a new contract is added.

Back-adjusted contracts have the following issues: Actual price levels can get lost and prices also can become negative; price ratios (for example, the ratio between gold and silver) can’t be calculated using pre-roll prices; volatility as a percentage of prices is distorted; stops can’t be based on a percentage and must be in points only; and pattern-matching algorithms can fail because back-adjusting can change the shape of historical price patterns.
Despite these issues, back-adjusted data still are valid. Their differences are absolute, which allows them to be converted to the dollar values per the change in the prices. Of course, you must make sure that you do not generate trade signals based on false prices.
Trading reality
What would be the use of the prices in back-adjusted contracts if they are generally bogus? Suppose we enter a position in lean hogs on the first day in the series and hold it until the end. We roll the position forward from one delivery month to the next. If we ignore commission and slippage, the exact value of the position can be calculated at any time from the back-adjusted chart because of back-adjusted contracts representing a special type of reality, namely trading reality. They depict what will happen to the trader holding a continuous position in any market. Hence, these contracts are the perfect resource for computerized system tests.
The next consideration is when we roll the contracts, defined by the roll trigger. Roll trigger determines when a continuous contract switches from one futures month to the next. Details may vary with the data management software, but in CSI Unfair Advantage, depending on conditions, roll trigger can be based on open interest, volume, a specific date or some combination of these (see "When to roll").

Rolling data affects market range and liquidity, so the timing of when we roll greatly affects the trade. Indeed, using different rollover logic can affect system performance greatly, both in testing and real time. For example, a difference of 20%-30% is possible between rolling on open interest and rolling based on a fixed date.
In addition, rolling on volume and open interest in some markets can be risky, as the first day of notice can occur (and often does) before the roll. For some trading systems, such as trend-following systems, this can exaggerate the profits.
Another issue to watch for is roll congestion. Money benchmarked to the S&P Goldman Sachs Commodity Index and other long-only commodity indexes has grown exponentially in the last half dozen years, with some estimates of size of the indexes approaching $300 billion. This has changed the nature of front-month and second-option spreads. Many traders and programs try to take advantage of the size of these rolls; that tends to push spreads toward full carry much earlier than would be otherwise. The bottom line is that the huge positions being rolled must be considered along with more traditional factors when rolling.
Ratio-adjusted contracts
Ratio-adjusted contracts are another concept. The right approach for the ratio-adjusted contract is to consider each contract roll similar to the classic split in equities, which is you get x shares for every y shares you hold, and the market price adjusts to reflect the market capitalization of the company in question.
Supposing a company does a 10-for-one split, it means that for every share you hold, you get 10 new shares. The price shall be divided by 10 after the split occurs. This creates a big price gap during the split. To avoid showing this gap, historical data also are divided by 10, so prices are depicted in the same manner as they are today. In effect, pre-split data have been back-adjusted using a proportional ratio.
We can apply the same process of proportional back-adjustment at every contract roll on futures contracts. Here, you need to calculate the adjustment ratio (price of new contract/price of old contract) that ensures a relative constant relationship in terms of percentages between any prices across the trading history. For example, assume on rollover the old contract closes at 1000 and the new contract is priced at 990. We need to multiply all old prices by 990/1000 to remove the gap between the two contracts.
Some issues remain, especially those relating to actual tick size vs. calculated tick size as compared to the maximum tick size (a contract-specific parameter you need to consider in your strategy and testing). In addition, consider that the data range prices don’t follow the tick size as they are being adjusted; the entire price series has been altered, so calculation of dollar profit/loss from these contracts is not possible; nor is using dollar-based targets. However, percentage-based calculations for both trade results and stops can be done.
Other methods include forward-adjusted contracts, which apply the difference going forward; compressing the adjustment over a few days to maintain price levels over most of the contract history; and manually adjusting each price according to the difference on that specific data, not just the rollover date.
History does not exactly repeat itself. However, history is all we have. Patterns and characteristics do persist, and just as a child isn’t a perfect replica of its parents, future market movements don’t exactly follow the lead of the past. However, traits are predictable reliably if we base those predictions on a logical premise, not random relationships. Developing these premises and testing them in the proper environment of backtesting assumptions is the key to trading system development. In the next issue, we’ll take a closer look at the specifics of these assumptions, and how you can make sure you formulate the right ones.
Murray A. Ruggiero Jr. is the author of "Cybernetic Trading Strategies" (Wiley). E-mail him at ruggieroassoc@aol.com.
