**Application**

The first step to building a trading system that uses these powerful advancements is to develop a robust rule-based model for trading. Next, we identify parts of the model that can be improved using neural networks. Examples of viable applications are predicting a certain feature of the model a few bars in the future or pattern identification. Then, we must test to see how robust this feature is. That is, if we are predicting something three bars into the future, we need to examine the worst-case scenario if our “prediction” is instead three days late. We only want to predict elements which, if we fail, won’t cause the entire trading system to blow up.

Two important steps are defining exactly what we are going to predict and developing our pre-processing methodology. The best type of pre-processing is custom pre-processing developed using advanced statistical analysis and data mining. This type of analysis is time consuming and is one of the most expensive parts of building these models. A reasonable, simpler pre-processing alternative was developed by Mark Jurik and is known as “Level-0” and “Level-1” (see “On the level,” next page).

Other limitations are algorithm-based rather than mathematical. For example, kernel regression degrades when you have more than 12 inputs. This is not that big of a problem because methods such as principal component analysis (PCA) can be used to pre-process inputs to reduce dimensionalities. PCA is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables. These values then are called “principal components.” The number of principal components is less than or equal to the number of original variables.

Other issues deal with how the inputs are normalized and their distribution. For example, in the case of neural networks, when using the tanh() function, we want to make sure that our data distribution is not concentrated at the extreme ends of the set. If so, the neural network can’t learn effectively. The same is true for various kernels used in kernel regression functions.

Pre-processing design is critical because the models are black boxes and statistical artifacts could control them easily. This refers to apparent cause-and-effect relationships that are erroneous because they don’t have a valid cause and effect. An example can be seen in Nasdaq data over the 1998-2002 period. You may have observed an exceptionally high correlation between the difference between the close and the price two days later, and today’s price minus the price four days ago. However, a model based on this relationship would have failed within a few months following the test period because the market started moving sideways; the relationship was only present during parabolic up and down environments.

Error distribution is another factor to consider. Two different models with the same root mean square error could have opposite profit profiles depending on how the errors are distributed. Predicting turning points well makes most of the money for these models. If we don’t predict turning points correctly by keeping values around zero, or by being wrong on large moves and right on small ones, it is possible that a network that does well on large moves but is wrong on smaller moves could make more money even though it has a bigger statistical error.

In this article, we’ve traced the modern history of neural network application in the markets and provided an overview of current accepted applications of the technologies. The next step is to build on this overview and lay out a real example applying neural networks as a component of a profitable trading system. Finally, we will discuss how new technological advances could bring the full promise of neural nets back into focus.