To use a neural network successfully in a trading application, it is necessary to start with a solid system that is viable without any advanced technologies. It doesn’t have to be a world-beater, but it has to work. Then, a neural network can be developed to improve the system.
To demonstrate this process, we will begin with an intermarket divergence model that trades the S&P 500 index using Treasury bonds as the predictive market.
Intermarket divergence systems are countertrend models that attempt to predict market turning points. This type of system produces a high percentage of winning trades but will sometimes miss major market moves when correlated markets do not diverge. Another problem with this type of system is that markets occasionally decouple and a standard intermarket divergence-based system will lose money unless a correlation filter can be included in the system.
Below is the basic system code for this system. This is written in TradersStudio Basic:
Sub IntermarketSP500(SPLen, InterLen)
Dim TrMarket As BarArray
Dim InterMark As BarArray
Dim EvalInd As BarArray
InterMark = Close Of Independent1 - Average(Close Of independent1, InterLen, 0)
TrMarket = Close - Average(Close, SPLen, 0)
If TrMarket < 0 And InterMark > 0 Then Buy(“”, 1, 0, Market, Day)
If TrMarket > 0 And InterMark < 0 Then Sell(“”, 1, 0, Market, Day)
End Sub
If the S&P 500 is moving down and T-Bonds are moving up, the
system buys.
If the S&P 500 is moving up and T-Bonds are moving down, the
system sells.
Historically, this system performed well until 1998 and then ran into problems. In its first incarnation, the SPLen input, which defines the length of the S&P 500 average calculation, was 10 and the length for the intermarket (T-bonds in this case) was 26. From 1998 through 2002, it only had one winning year and lost about $95,000. However, since Jan. 1, 2003, this parameter set has been doing well, making more than $110,000.
The original 10 and 26 values were set based on optimization runs back in 1994. Similar runs performed through data available today suggest a value of 10 for both inputs. This set of parameters has done much better in the post 1998 period and has made $160,000 more than original values. Here are the numbers, without deduction for slippage and commission.
Net Profit: $374,100
Trades: 387
Win Percent: 67.70%
Ave Trade: $966.67
Max Drawdown: $98,525
The main difference is that this version of the system made more than $127,000 during this same five-year period. We will use this version of this model as the basis for this article.
FAR FROM PERFECT
Obviously, while profitable, this system needs to be improved upon before it is allowed to risk real capital. The trade that went short on June 7, 1999, started well and was up almost $5,000 before turning around and becoming a $12,000 loss (see “Good gone bad,”).

This is a problem with intermarket divergence systems. From what we have seen, we have a good trading strategy with flaws that, if they could be resolved, could be much better. A neural network can help us eliminate those flaws.
To do that, we first have to develop a target that the neural network can predict that will improve this model. Next, we need to develop the preprocessing that is predictive of this target.
The first step is to build a shell system that uses as its input the original system itself. Here is the logic:
If TrMarket < 0 and InterMark > 0, then InterIndex = 1
If TrMarket > 0 and InterMark < 0, then InterIndex = -1
With these simple rules, we have created a raw indicator that will buy when the InterIndex is 1 and sell when it -1. This signal is used to generate buy and sell signals and will give us the same results as the system.
Next, we need to design a target that turns this signal off if a trade is likely to lose money. We do not have to change the value of InterIndex but change the target to 0 if the trade is likely to lose money. In these cases, we can exit the trade or elect not to take the future signal. Here is the logic:
If InterIndex = 1 and Average(Close+lookahead-Close),3) < 0, then InterIndex = 0
If InterIndex = -1 and Average(Close+lookahead-Close),3) > 0, then InterIndex = 0
InterIndex can be used to filter our existing trading system if it can be predicted. Now our goal is to predict this target. Based on our analysis of the performance of the target and predictability, we picked a look-ahead period of six bars. Believe it or not, our target has changed InterIndex to 0 on most days. Throughout the last 10 years, only 17% of the days have values that were not changed to zero. We had this agreement without any change in the -1 or 1 values based on looking into the future, we just changed these values to zero if the trade was losing money during the look-ahead period. The improvement to the bottom line is staggering. This target produced $1.97 million instead of $370,000.
More important, the output of the network does not represent the core of the system so that its failure will not cause the system to totally collapse. It is this type of fault tolerant design that is necessary to incorporate into neural network based trading systems.
PREDICTIVE INPUTS
Now that we have selected our output, we need to find inputs that are predictive of it. There are several ways to tackle this chore. It can be accomplished using domain expertise, statistical analysis and scatter charts.
In any case, the first step in trying to develop predictive inputs is to look at the components that make up your output. In this case, these inputs include:
1) Close of T-bonds - Average(Close of T-Bonds,10)
2) Close of S&P500 - Average(Close of S&P500,10)
These two raw components made up the target before we looked into the future to turn it on and off. In developing our intermarket divergence mode, these are processed in this manner:
If Close of S&P500 - Average(Close of S&P500,10) < 0 and Close of T-Bonds -Average(Close of T-Bonds,10) > 0, then Mode = 1
If Close of S&P500 - Average(Close of S&P500,10) > 0 and Close of T-Bonds -Average(Close of T-Bonds,10) < 0, then Mode = -1
This leads to the question: Should the actual values of 1 and 2 be used, normalized values of them or only their sign because that is the important element in calculating the mode?
Another component of our output was the “n” day future price trend. We could use an n-day price trend starting today and shifted back “n” days as another input because this is also a component of our target.
Next, we should look to our desired output and the shift back in time to determine that it does not contain any future data.
Breaking the desired output into its components and shifting it so we are not looking into the future is a good place to begin finding inputs. Another example would be if we wanted to predict an indicator such as the Average Directional Index (ADX) five bars in the future; to do so, we would likely use current ADX values, ADX momentum as well as DMI-plus and DMI-minus because they are the components that are used to create ADX.
After developing inputs that are related to our target, we should look at the input variables for the system itself. One obvious example would be the correlation between the S&P 500 index and T-bonds because their decoupling is one of the reasons that the standard intermarket divergence mode was changed by our forward-looking filter. In cases of developing an intermarket application, we would want to look at both short-term and long-term correlation. We would also want to study predictive correlation.
Predictive correlation looks at the momentum of a predictive market shifted back “n” days and compares it to the momentum of the S&P 500 throughout the last “n” days and running a Pearson’s correlation on the two figures. It tells us if the current momentum of T-bonds has recently been predictive of the S&P 500.
Next, we need to sample all of these inputs going back. The samples need to be more frequent when closer to current values and farther apart as we move back in the sequence. For example: 0, 1, 3, 5, 10, 20, 30 is an example of how we would sample past values of the inputs to predict the output.
Using the components of the output shifted back and strong fundamental relationships, you see how the desired output is a great starting point for developing your neural network inputs.
BUILDING THE NET
The first generation neural networks were traded using as much as 10 years of data, which in some cases represented 80% of the history of the futures market. This left very little data available for out-of-sample testing and often the in-sample period was looked at to determine how well the neural network learned. Often, the results would be great during the in-sample period but bad during the out-of-sample period.
During the late 1990s, products such as NeuralShell Trader introduced the concept of walk-forward training of neural networks. A system could trade 10 years and then roll the window forward one year to retrain the network each time. In theory, this gave us in most cases at least five years of walk forward out-of-sample testing so we could actually backtest a system using the out-of-sample networks through a long enough time to see how the neural network helped the trading results. When walk-forward testing first came into use, many of the networks developed during the early 1990s would not consistently retrain walking forward.
It took clever development to create neural networks that were stable in the future. It magnified the problem that neural networks find local minimums and start with random weights, which in theory are different each time the system is trained.
This means that a system could have a good design and that if trained 10 times, might have three or four solutions that were good enough on the training set to move on to the test and out-of-sample set. So, walk-forward testing also needs to be able to retrain and evaluate the network as it walks forward and not stop until it finds a network that is predictive of the training set with the right error distribution to maximize profits. The neural network would be used to walk forward for 200 to 250 bars and then rolled forward and retrained.
This is the mechanism. Now, the question is how to implement it. We are going to use TradersStudio with its NeuralStudio add-in. NeuralStudio implements Backprop and Radial nets. In this example, we will use simple classic Backprop.
The first issue in implementation is that most high-quality backtesting platforms do not let you look into the future with the exception of the
next day’s open. This is true of TradersStudio and TradeStation. For this reason, a concept where we can predict in the future by shifting inputs was used. This was accomplished
in TradersStudio using the
following function.
Function ShiftBack(Series, DaysBack) As BarArray
Dim XSeries As BarArray
XSeries=Series
ShiftBack=XSeries[DaysBack]
End Function
We would use it as follows:
NNInput20=Shiftback(Cor, DaysPredict)
This example allows us to use the current value of correlation and shift it back so we can predict the sign of the price change five bars in the future. We would set DaysPredict to 6 and then use Sign(CloseClose[DaysPredict-1]) as our target. This methodology has existed since the early 1990s when the first neural network trading tools were developed, but it’s only recently that analysts have understood how to
better use them.
To develop our neural network, we will write a simple system that uses a window of 1,500 or 2,000 bars of data and retrain it every 200 to 300 bars. We will test the network on the training set to see how it performs and then retrain the network until it meets the established criteria. It then will be traded until the next scheduled retraining. We used the results of the neural network output as follows:
If InterDivIndic = 1 And MyPred > Trigger Then Buy(“”, 1, 0, Market, Day)
If InterDivIndic = -1 And MyPred < -Trigger Then Sell(“”, 1, 0, Market, Day)
If Pred < Trigger, Then ExitLong(“LX”, “”, 1, 0, Market, Day)
If Pred > -Trigger, Then ExitShort(“SX”, “”, 1, 0, Market, Day)
In this example, InterDivIndic is the standard intermarket divergence model. MyPred is the neural network output. The target has three values -1, 0 and 1. Trigger is used to give us an error zone around zero. If the value falls inside that error zone, meaning we believe the network output was zero, the trades would be exited. The network is not used to create new signals, just to filter existing ones.
When developing neural networks, it is also important to decide on other technical factors like hidden nodes and learning rate when using algorithms like Backprop. This is not the case with newer algorithms such as Radial nets, which are based on interpolation of nearest-neighbor matching.
Once you have experimented with inputs, unless you find a good solution on a first try — possible but rare — you need to study how the neural network and the system performed through time. Try to understand why the system performed badly during certain periods of time. It is important to develop inputs that can give the network the information it needs to resolve the problems.
After a model has been developed and tested, the next step is to look for examples that show why something happened. That way, we can provide the network with a good solution. For example, you might find after adding a 50-day moving average to the chart that the rallies stopped well short of the new indicator. If the close minus the 50-day moving average is included and normalized using the standard deviation and sampled, we would see that prices cross below the 50-day average after we buy and stay below it for about 20 days before the market collapses. This input could give the network the information it needs to turn this signal off by late August before the serious damage is done.
Also, when developing a neural network model, it is important to have inputs that are not too strongly correlated. An example would be considering whether to use the stochastic slow K value and the relative strength index in the same model or only use one because they effectively impart the same information. By using scatter charts and other correlation tools, you can tell if such indicators are related in significant ways. For example, while they may be correlated at lower levels for their inputs, they may be quite independent at higher levels.
What should become clear by now is that not only do we protect against the risk of implosion by using neural networks as overlays of existing systems rather than the systems themselves, but we also open up new avenues to rely on our creativity to improve our trading models in ways that were never possible before.
By using today’s new computing power and applying advanced technologies such as neural networks, genetic algorithms and fuzzy logic, the door has been opened for a new age of computerized self-adaptive trading models. Now as we move into the 21st century, these advanced technologies are finally achieving some of the promise shown during their first incarnation.
Murray A. Ruggiero Jr. is a consultant in East Haven, Conn. His firm, Ruggiero Associates develops market-timing systems. He is editor-in-chief of Inside Advantage Gold Club (www.iagoldclub.com) and is the author of Cybernetic Trading Strategies (John Wiley & Sons). E-mail: ruggieroassoc@aol.com.
