In "Beware the backtest" (June 2011), we discussed many issues relating to data, as well as how the markets have changed over the past few years because of electronic trading. We discussed the effects the closing of the day session had on trading when open outcry trading dominated and how the shift to electronic order matching changed the landscape. We also discussed how to deal with the gaps between contracts.
Here we will expand on those concepts and demonstrate how to formulate the proper foundation for your systems. First, let's pick up on the topic of rolling contract data — shifting from one contract month to the next — and how that can affect the results of a system backtest. Consider the following basket of markets: Copper, corn, cotton, crude oil and the euro. Now, let's look at rolling the contracts based on three different sets of criteria, discussed in detail in the June issue: Volume/open interest, volume and fixed dates.
We are going to use combined pit/electronic CSI data for our analysis. It is not easy to merge pit-only and electronic-only results based on when electronic volume surpasses pit volume. The CSI combined data reflects a 24-hour electronic market with the pit values for high and low included. If the pit values are outside the 24-hour electronic range, the pit values are used.
In addition, we also will use Pinnacle merge contracts based on the fixed dates. Again, this uses the pit contract until the electronic markets have more volume. Our merge dates are as follows: Euro: Sept. 25, 2007; crude oil: Dec. 15, 2008; cotton: July 11, 2008; corn: April 2, 2007 and copper: May 13, 2008.
We will test this data using two different systems and the same parameters for all the data sets. (Because we are looking at the differences in performance between the four data sets and not attempting to identify a successful strategy, we will not deduct for slippage and commission.)
Testing the rolls
We'll start our comparison with a simple channel breakout system. Here's the code:
Sub CHANBREAKOUTBreakdown (SLen)
We will run our results for Jan. 3, 1989, to June 8, 2011. We are using 20 for the Slen variable. Results are shown in "Breakout comparison".
Looking at the first three tables that use the CSI data, for the same system during the same testing period, the results across all markets differ by 15%-20%, but on a market-by-market basis the difference is a lot bigger. For example, corn's net profit ranges from $2,125 to $16,937. This is a factor of almost 800%. The maximum drawdowns are similar throughout all three rollovers, but net profit is different. The number of trades differs by about 10%; this is the cause of the discrepancies in the statistics.
The results for the Pinnacle merge data that combine the pit-only and electronic-only data, using the same fixed date rollover as discussed above, paint a different picture. You can see that using this method of assembling the data, we actually lose money on corn.
Next, we'll look at the results for a triple moving average system. Here is the code:
Sub ThreeMACrossoverbreakdown(SLen As Integer,MLen As Integer,LLen As Integer)
Dim ShortAve As BarArray
Dim LongAve As BarArray
Dim MedAve As BarArray
ShortAve = Average(Close, SLen,0)
LongAve = Average(Close, LLen,0)
If ShortAve > MedAve And MedAve>LongAve Then
If ShortAve < MedAve And MedAve
We are going to use six, 15 and 30 for the moving average lengths for all tests. The results are shown in "Moving violations". The key difference between these two strategies is the channel breakout system uses high and low, while the moving average crossover is based on the close.
We can see that corn has large differences either making money or losing money. Cotton has quite a bit of variation, while the other markets using the triple moving average do not have that much. Overall, we have a smaller variation than the channel breakout system.
Looking at the Pinnacle data results table in "Moving violations," we see larger variation. The difference is more than $60,000 between the fixed date rolls using the combined pit/electronic data and the Pinnacle merged data. The euro has the biggest difference, making up about $40,000 of the total. Some of this is because of early electronic periods for the euro containing odd ticks during the liquid periods. We also can see that the drawdown numbers in some of the markets are different.
The results demonstrate the significance of the type of data you use in your backtesting. For the rigorous trader, the answer should be to focus on the data that best reflect reality. While you may be tempted to discount the odd ticks in the euro electronic market as outliers, the data are the data. If you were trading this market at the time, those would have been the prices you would have seen. For historical testing, using pit-only data and then following a critical mass of volume to electronic markets when it made that leap is the most accurate reflection of the past and, perhaps by extension, the future.
When we roll
To better understand the process of rolling from one contract to the next, let's look at some sample roll dates for different methods for crude oil:
Crude volume/open interest:
• May 19, 2011 (roll to July)
• April 9, 2011 (roll to June)
• March 21, 2011 (roll to May)
Open interest only:
• May 10, 2011
• April 12, 2011
• March 9, 2011
• May 11, 2011
• April 11, 2011
• March 11, 2011
We can see that (a) fixed dates produce consistent rolls; (b) open interest only produces fairly consistent results for crude and (c) volume/open interest can create issues and consistently is later by eight to 10 days. Although open interest only may roll similar to fixed date, plus/minus a couple days, it still can cause changes in results from 5% to 10%.
Now consider corn:
Corn volume/open interest:
• April 19, 2011 (roll to July)
• February 24, 2011 (roll to May)
• April 8, 2011
• February 10, 2011
• April 25, 2011
• February 23, 2011
With corn, we see that fixed-date rolls come later than volume/open interest or open interest. We also see that between the two, there is no consistency. Open interest only rolls much earlier, using both volume and open interest rolls about the same as fixed date for the May contract, but with a six-day difference for the July contract.
Different roll methods produce different roll dates and looking across the different markets, there is no consistent pattern. This is why fixed-date rollovers make the most sense — perhaps they're the least of all evils.
Stop orders are another issue that continue to plague electronic markets. The problem with early electronic markets was that they were not that liquid and, for example, you could get significant slippage. Thus, electronic markets forced a change from stop orders to stop-limit orders.
Electronic markets establish bands on stop orders limiting the amount of slippage traders experience. If someone places a buy stop in corn at $6.70 and the "band" is 5¢, then the worst fill the person would receive is $6.75. If corn were to open one session through the stop — let's say at $6.80 — then the $6.70 stop would not be filled and would work as a $6.75 buy limit.
The case for system testing
Considering the difference we saw in hypothetical results based on data roll methods, it's understandable to question whether your backtest results have been valid. In light of this, what does backtesting prove?
When we are trading futures and displaying the hypothetical results, we see a disclaimer about the limitations of hypothetical performance and that past results don't repeat. The limitations of hypothetical results are (a) partly due to a lack of a data standard and (b) issues with system testing protocol itself (such as the exploitation of hindsight). Given the ease it is for an untrained researcher to fall into these traps, it's easy to see how system trading might get a bad rap.
However, system testing is not to prove that your idea works. The purpose of system testing is to prove that your idea did not fail. This simple point is quite important. If a system idea does not work well when backtested, it cannot work in reality. Assuming you follow the rules of proper testing and a scientific approach, you can then use this analysis to verify your theory.
If your theory does not perform as expected, trash it. Don't keep changing your rules to make it work. If it is not performing as expected, it not only means that it is unprofitable, but that it is fundamentally broken. For example:
- An intermarket system whose core logic wins less than 50% of its trades should be tossed even if it is making money. By the nature that the relationship itself should be based on an economic reality, an intermarket divergence system should win more than 50% of its trades.
- Likewise, a trend-following system that has a high winning percentage but a win/loss ratio around 1.00 should send up some red flags. The quick, and probably correct, approach would be to discard it and move on — its reality does not match the underlying theory. Therefore, something is happening here that you don't understand. If you attempt to achieve that understanding by modifying and retesting the system, you are falling into the curve-fitting trap — probably the system developer's greatest folly.
System development should be approached scientifically. The scientific method is a process of asking and answering questions through observations and experiments:
- Ask questions: Are silver prices predictive of bonds and negatively correlated? Are utility stocks predictive of bonds and positively correlated? How do breakouts of different bands around a moving average work in capturing real trades and avoiding false breakouts?
- Do background research: Research what others have done in this area. Look at academic papers as well as working papers from the Federal Reserve for ideas, not just general trading publications and books.
- Theory construction: Create your trading rules based on your basis. For example, use intermarket divergence as a concept to see if silver prices or utility stocks are predictive of bond prices. Also, an envelope around a moving average based on true range can be used to create a trend-following system. We would consider a breakout of the upper channel as a long entry and a breakout of the lower channel as a short entry. Exits would be the moving average line. Because breakouts are rare, we would have to see how many true ranges would contain 90%-95% of the data.
- Test your theory: Run your tests.
- Results analysis and conclusion: Analyze your results, including the optimization results. Do analysis of the optimization surface and judge robustness. Make sure results match the theory. For example, in the case of our trend-following system, we want to find those parameters that make up 90% of the day channel and also make the maximum money.
- Be fair: It is important for your experiment to be a fair test. A "fair test" occurs when you change only one factor (variable) and keep all other conditions the same.
- Address issues: Address issues and develop tools to help traders become scientists when developing trading systems.
Building and trading systems can be profitable, but it also potentially can be time-wasting and expensive. Follow the steps laid out above — and be honest with yourself with regard to your approach and results — and you will find yourself on the long but rewarding path of trading system development.
Murray A. Ruggiero Jr. is the author of "Cybernetic Trading Strategies" (Wiley). E-mail him at email@example.com.