Machine learning models require educated inputs

September 29, 2016 09:00 AM
Building an effective neural-network based trading system requires not just the proper tools but an understanding of the markets you’re trading.

Supervised learning methods include back propagation neural networks, support vector machines and machine induction algorithms such as C.4.5 and rough sets. These are all examples of supervised learning because they require having a target answer to teach the classification or model predictions. When it comes to trading applications, these areas are relatively unexplored. 

Trading goals are not as simple as predicting direction. Many factors determine if a target is good. Consider distribution of error. Without considering this, you could have a losing system despite predicting direction correctly 60% of the time. 

Signals vs. targets

When designing a target for our supervised learning algorithm, we need to ask: What is it we want to teach? To forecast direction, identify patterns, detect market turning points or to capture more complex targets? First, let’s consider market direction.

Consider intermarket divergence, which was pioneered here more than 20 years ago. Developing futures-based trading systems in 1995 with backtests based on continuous contracts — time series that splice together numerous individual futures contracts — was tricky as futures contracts have a beginning and end date.

The tricky part of stringing contracts together is that they rarely trade at the same price levels. This creates a price gap where one ends and the other begins. There are various ways to cope with this issue. One is to ignore it. In a so-called “splice chart,” there will be visible price jumps where contracts are joined. Another is to create some type of weighted average between the contracts to smooth out the jumps. These “continuous charts” are fine for intermarket divergence because we aren’t measuring absolute price levels; rather the ratio between markets.

The rules for intermarket divergence are as follows:
For positively correlated markets, if the intermarket is in an uptrend and the traded market is in a downtrend, then buy the traded market; if the intermarket is in a downtrend and the traded market is in an uptrend then sell the traded market.

For negatively correlated markets, if the intermarket is in an uptrend and the traded market is in an uptrend then sell the traded market; if the intermarket is in an downtrend and the traded market is in a downtrend, then buy the traded markets. 

You can use various concepts to define trend direction. Most of my work uses price relative to a moving average. Mathematically, when Price - Average(Price,X,0) is above 0, that market is in an uptrend. When it is below zero, the market is in a downtrend. 

For example, consider the 30-year Treasury bond and the Philadelphia Electrical Utility Average (UTY). The relationship between utility stocks and T-bonds has been one of the most stable over the past 20 years. Our analysis will extend back to Sept. 22, 1987.  The parameters have remained robust with eight and 20 working well from 2010 until today. Indeed, the parameters used in 1997 
(8, 16) also perform well today.

Remember, however, our experiment is to see how making money relative to a viable drawdown correlates to a traditional performance gauge, such as predicting market direction. We will look at the parameter set’s performance and compare it to price direction N bars into the future using the optimizer in TradersStudio (see “Best measure,” below, for the code). 

We will optimize the moving average for T-bonds and utility stocks, as well as how far ahead to look for our intermarket signal to generate the highest percentage of direction-forecasting success. The highest net profit, as well as the highest net profit/drawdown ratio, did not do the best at forecasting direction. The eight, 20 set of parameters made the most money, but it could manage only a 51.3% success rate predicting price direction one bar into the future and a 54.67% success rate 10 bars into the future. 

During our testing period, T-bonds have had a strong upward bias; during this period, the Treasury market over a 10-day window is up 55.5% of the time. This means that if we always said the market would be up, we would be right 55.5% of the time. This buy and hold strategy would make about $160,000, so making more than $300,000, which our system does, requires considerable skill, considering that same system does worse picking market direction. 

Consider this hypothetical scenario: A hat full of ping pong balls, 55.5% of balls say “up,” and 44.5% of them say “down.” Let’s assume we have 4,845 trading days. This means we should have 2,688 up days and 2,156 down days. The expected winning percentage would be as follows:

(Estimated up days * Up percentage) + (Estimated down days * Down percentage) = Correct forecast


2688 * 0.555 + 2156 * 0.445 = 
1491.84 + 959.42

This is only 50.59% accurate, which means that the difference between being correct on direction 51% vs. 55% of the time is statistically significant. It does not take much of an edge to make money. However, a huge edge does not always make the most money.

“Past imperfect” (below) shows how much money we could make if we could perfectly predict “close - close[x]” where “x” is the number of days we predict the market into the future, between one and 15. Even predicting things perfectly does not create a scenario with 100% winning trades. Drawdowns vary from a bit above $2,300 to more than $10,000. Our optimal forecast window for intermarket analysis is 10, but even when predicting direction perfectly, the 10-day price window only makes money 83% of the time and has an $11,000 drawdown.

This entire discussion is significant because it gets to the heart of our broader discussion regarding training our neural networks. What we choose to train them against is just as important as how we train them using modern trading technology. Is predicting market direction, which many people attempt to do with advanced methods, really the best target to teach our trading systems?

Predicting everything & nothing

Statistics can be misleading in developing models. For example, the average difference between today’s low and tomorrow’s low in bonds is about a tick. So, if we develop a method that claims its average error for predicting the T-bond is just a tick, it means nothing.

When viewing average values in terms of forecasting, we can use a method called “conditional probabilities” to make rudimentary forecasts. For example, if we wanted to predict market turning points, we would want to look at the average value of the relative strength index (RSI), for example, at a turning point for a given technical condition such as price above a moving average. To determine whether this is useful, we need to examine the standard deviation (STD) of the values at the turning point. The value at the turning point needs to be at least 2.5 times the STD  to be predictive. So, if the swing low turning point is at an RSI of 30, we want to see a STD less than 12 for it to be predictive. 

Back propagation and other forward-feed supervised neural network algorithms require an error function that is continuous when we take the derivative of it. The derivative of these is taken and used to propagate back the error. This normally is used to produce models with RMS error. This is not an idea for trading. When we use these in trading, we need to train, test in a trading system framework, save the results of the backtest and continue training until we get the best training results. 

The key in developing trading models is the outliers. We want to use methods that square errors because they penalize missing forecasts on outliers more. When developing these models, forecasting a big “up day” that turns out to be a big “down day,” or vice versa, is the worst that can happen. 

Old targets are new again

When first developing neural networks for trading in the early 1990s, the idea was to predict future values of classic technical indicators. The main problem with classical indicators is lag because they are often created or smoothed with moving averages. If we can correctly predict these indicators in the future, we essentially eliminate the lag.

Let’s look at the RSI. Here is how it is calculated manually:

RSI = (100 – (100 / (1 + RS)))

RS = X-day EMA of up day gains / X-day EMA of down day losses

EMA (exponential moving average) is calculated as follows:

EMA = Price(t) * k + EMA(y) * (1 – k);
t = today, y = yesterday, N = number of days in EMA, k = 2/(N+1)

Short-term RSIs are too noisy. Long-term RSIs have too much lag. Let’s see if we can eliminate some of the lag in a long-term RSI and use it in a trend-following system. Consider this strategy below:

Sub RSISimpleSystem(SLen,BuyLev,SellLev)
Dim RSIValue As BarArray
If RSIValue>BuyLev Then Buy(“”,1,0,Market,Day)
If RSIValue<SellLev Then Sell(“”,1,0,Market,Day)
If RSIValue>50 Then ExitShort(“”,””,1,0,Market,Day)
If RSIValue<50 Then ExitLong(“”,””,1,0,Market,Day)
End Sub

Let’s run this on crude oil for Jan. 4, 1984, to May 13, 2016. We will use a 40-day RSI and buy when it crosses 55 and sell when it goes below 40. We will get flat when it crosses 50. This simple strategy makes more than $179,000 before slippage and commission with 125 trades. The drawdown is a high $90,000 and the winning percentage is only 39.2%, but on a high profit factor of 2.05. 

Let’s now pretend we can predict this 40-bar RSI into the future. If we could predict it just one day ahead perfectly, we double our profits. Predicting it two bars into the future makes more than $590,000 (see “Predicting RSI,” below).

Once we predict just three bars into the future, which is reasonable on a 40-bar indicator, we get about a 75% of the maximum increase and over half of the performance of predicting five bars into the future based on profit factor. We can also see than any forward prediction drops drawdown.

The problem with traditional strategies that predict price is there is a high penalty for failing; if we miss our predictions or we’re late, it falls part completely. When we predict future indicator values, being one or two bars late is still profitable. The penalty of a failed prediction is minimized when we build a system around the standard version of the indicator and then try to develop predictive models and substitute that into the system. 

Formulating how to decide what you want to predict when developing supervised learning models, especially neural networks models, is an important step in machine learning. How what we decide to predict integrates with actual mechanical trading systems is one of the most important factors in building reliable models. If you want to use machine learning successfully in trading, you must take the time to learn not just the advanced tools but the trading environment in which you’re working.

About the Author

Murray A. Ruggiero Jr. is the author of "Cybernetic Trading Strategies" (Wiley). E-mail him at