Modeling with neural networks

April 26, 2016 09:00 AM

In the past 20 years, not much has changed in the merger of technology and market analysis. However, development has taken place on the technology side. For example, support vector machines (SVM) were invented in the last century. But particle swarm trained neural networks are new. Preprocessing/post-processing techniques are in vogue. Tools such as digital filters, wavelets and fisher transforms are in play. Nevertheless, the successful application of this methodology in the financial markets has failed to reach the mainstream.

The good news is these tools are used productively in other fields. A mechanical engineer might use a neural network to process manufacturing data and adjust the parameters to keep errors within an acceptable range, all while the numerical process control is running. This would be akin to analyzing price trends. A forensic researcher might use an SVM for finger-print identification. This mimics price pattern analysis.

Our goal is to change that. Here, we hope to step forward with SVM-based models (see “Taking hybrid trading systems to the next level,” March 2016). Our first step is to address the purpose of our SVM. Are we using it for regression or patterns? This helps us decide not only what algorithm to use but how we preprocess our data. 

Pattern identification

There are two major preprocessing methods in pattern identification. One is processing the data time-wise. The other is as an image with a sliding window. If we look at time slice data, we will preprocess the data by normalizing between 0-1 at each data point over an observational window. For example, we could use a simple scale, as below:

Scale = (Close−Lowest(Close,Window))/(Highest(Close,Window)−Lowest(Close, Window))

We can also have NewScale = 2*(Scale-0.5) and scale from -1 to 1. We have a point for each time slice in the window. 

Another method is to break the image into pixels. An example of this type of logic in trading would be point-and-figure charts. 

Point-and-figure charts are created from columns of Xs and Os, which alternate with each other. The Xs represent a period of rising prices and the Os represent a period of falling prices. Our output can be defined several different ways:

  1. Use one output per pattern, 0 to 1; a 1 means it matches that pattern, and a 0 means it does not.
  2. Use multiple outputs and encode multiple patterns in a series of 0-1 outputs.

When performing pattern identification using machine learning methods, there are two types of errors: False reject (FR) and false accept (FA). False rejection means we miss positive occurrences of a given pattern. False accept means we get false positives for a given pattern. Almost all pattern recognition methods have one of these problems. Using post processing and multiple models allows us to develop better pattern recognition results even with noise data. 

The work of Xinyu Guo, Xun Liang and Xiang Li uses neural networks for stock price pattern recognition. The experiment shows that neural networks can effectively learn the characteristics of patterns and recognize them accurately. Most beneficial patterns can be summarized as either a continuation or a reversal pattern. The continuation pattern indicates the trend will continue, whereas the reversal pattern indicates that it will change. The analysts identified 18 typical patterns, with 10 of them being continuation patterns and eight being reversal patterns. 

Guo and Li proposed to transform the original time series into a sequence of trend segments and corresponding features using a neural network and segmentation process. Each of the features would be calculated in terms of the price at the last time point within the segment. 

The concept of time series segmentation has become a hot topic in data mining. Consider a simple variation. If we look at weekly bars and average the open, high, low and close of a bar, using that mean as our value, that would be a basic example of this methodology. The segment length can be any time frame:  Two days, three days, 10 days, etc.

Again, this is simplified. Actual algorithms in use split and optimize a given data window into segments, while processing the segment size and determining the method used in calculating the points value. The goal of this optimization is to maximize information content to extract the patterns. 

For my study of 18 typical technical patterns, a three-layer feed-forward neural network was used. These are typically composed of one input layer, one output layer and one hidden layer. In the input layer, each neuron corresponds to a feature; while in the output layer, each neuron corresponds to a predefined pattern. Once a certain sample is input into the network, the output will be a vector with all elements as 0, except the one corresponding to the pattern the sample matches. Work remains to be done to expand far beyond these 18 patterns, as well as better control noise, but the results were good enough to show promise. 

Regression models

Neural networks can be used as a universal function approximations method. Almost any type of advanced regression method can be emulated using neural networks. Things like Ridge Regression and Logical Regression can be simulated using neural networks. Where neural networks are better is in Logical Regression, which requires a linear separable problem while a backpropagation neural network does not. This is where neural networks have an edge in being able to create a non-linear model when they are needed. 

One application of this is to use neural networks as an alternative to classical statistical techniques for forecasting within the framework of the arbitrage pricing theory (APT) model for stock ranking. Various peer review research shows that neural networks outperform standard statistical techniques in forecasting accuracy terms and give significantly better models. In addition, we can use sensitivity analysis with the neural networks to provide a reasonable explanation and supply a more convincing model of their predictive behavior as compared to standard regression methods.  

One of the big problems with technical indicators is lag. Because neural networks can perform non-linear regression, they can be used to remove the lag by predicting the indicator a given number of bars into the future. Ruggiero Associates has used this type of method since the early 1990s. This can be very effective, for example, for predicting a crossing of a 14-period indicator three bars in the future. This can often make multiple times the profit of what the standard indicator does, and often three to five times more. This is why using neural networks for regression is of interest to traders. 

Preprocessing cookbook

Let’s now put together some preprocessing recipes for developing neural network and SVM models. Preprocessing methods have not advanced much during the past 25 years. We’ll look at a classic preprocessing method developed by Mark Jurik in the early 1990s. 

Jurik developed this methodology with “Level-0” and “Level-1” features as a generic preprocessing method. The base idea is that price series are formed by cycles of different frequencies. We sample the data using a frequency close to the current data point and spread the sample out as we go back. The goal is to sample one-and-a-half or two cycles. So, as we look at longer cycles we have to cycle less frequency. By sampling at different frequencies, the samples would carry all the information in the series. 

To do this, we use sample blocks of data. If a block is further in the past, it is spaced further from the next block and is larger. The index determines how far back in time the center of the block is situated and is chosen such that it covers the period between the consecutive blocks. 

The indexes are provided as shown:

Level 0=Log(Value/Value n) Where n is number of days ago from the table “Sample size” (below). 

“Level 1” uses the pair n,m to create a block moving average. Let’s assume n=7. Then we would average as such:

Level 1=Value/(Block Average)
Let X=n-m/2
Let Y=n+m/2
Block  Average=((Sum from ValueX to 
ValueY)/m+1)

In the case of N=7 we would average elements 6, 7, 8 and divide that by value.

This strategy provides the neural network with the information it needs to look back in time without sampling every bar. For example, if we believe the price of gold affects the 10-year Treasury note for up to 50 bars, we would use the sample for row “n,” which is less than or equal to 50; in this case it’s 49. We would sample the first five days and then our samples would become further and further apart because the further apart samples are trying to put up longer-term cycles. If we are trying to find a 30-day cycle, it can be reproduced sampling every five days without needing to sample every day.

“Level 0” features are normalization between price and the exponential moving average of price, which are sampled using row n in the table. “Level 1” features are normalized price changes relative to a block moving average.

Fisher transform

It is commonly assumed that prices have a Gaussian, or normal, probability density function. A Gaussian probability density function is the familiar bell-shaped curve where one standard deviation includes 68% of the samples surrounding the mean. The Fisher Transform changes the probability density function of any waveform so that the transformed output has an approximately Gaussian shape. Let’s see code to calculate the Fisher Transform in TradeStation’s EasyLanguage below. This was originally published by John Ehlers:

Inputs: Price((H+L)/2), Len(10);
Vars: MaxH(0), MinL(0), Fish(0); 
MaxH = Highest(Price, Len); 
MinL = Lowest(Price, Len); 
Value1 = .33*2*((Price - MinL)/(MaxH - MinL) - .5) + .67*Value1[1];
If Value1 > .99 then Value1 = .999; 
If Value1 < -.99 then Value1 = -.999; 
Fish = .5*Log((1 + Value1)/(1 -  
Value1)) + .5*Fish[1];
Plot1(Fish, “Fisher”); Plot2(Fish[1], 
“Trigger”);

If the prices are normalized to fall within the range from –1 to +1 and subjected to the Fisher Transform, the extreme price movements are relatively rare events. This means the turning points can be clearly and unambiguously identified. “Value1” is a function to normalize price within its last 10-day range. The period for the range is adjustable as an input. “Value1” is centered on its midpoint and then doubled so that “Value1” will swing between the –1 and +1 limits. “Value1” is also smoothed with an exponential moving average whose alpha is 0.33. The smoothing may allow “Value1” to exceed the 10-day price range, so limits are introduced to preclude the Fisher Transform from blowing up by having an input value larger than unity. The Fisher Transform is computed to be the variable “Fish.” The plots of Fish provide a crossover system that identifies the cyclic turning points.

Here’s how we would apply Fisher Transform to the classic relative strength index indicator. This code also comes from John Ehlers.
 
Vars: IFish(0);
Value1 = 0.1*(RSI(Close, 5) - 50); 
Value2 = WAverage(Value1, 9); 
IFish = (ExpValue(2*Value2) - 1) /  
(ExpValue(2*Value2) + 1); 
Plot1(IFish, “IFish”);
Plot2(0.5, “Sell Ref”);
Plot3(-0.5, “Buy Ref”);

Using the Fisher Transform normalizes the data and its distribution, which can be useful in developing neural network models. 

Noise controversy

In the early days of analyzing markets with neural networks, we smoothed both the input and output data because backpropagation networks had issues with noise. These noise removal methods were mostly variations of different moving averages. A more sophisticated method is the Kalman filter introduced by Dr. R. E. Kalman in 1960. In general, it can be thought of as generating an optimal (in a linear, white noise, mean-square-error sense) estimate of a future position based on the target’s velocity, acceleration and uncertainties. The Kalman filter goal is to create a zero lag signal with greatly reduced noise.

The problem for the Kalman filter is it requires the underlying system to be modeled accurately, and there are no models for the market. Hacks such as assuming a certain amount of white noise must be made to use Kalman filters.  

Unfortunately, removing noise not only causes lag but creates other issues such as phase shifting. In addition, filtering noise can destroy information. For example, we might lose volatility data. We could filter the data and the noise component as two separate input variables so that we do not lose this information. Contrast is valuable noise as measured by standard deviation or range calculations. It’s important to keep the valuable noise and not filter it out in development model development. 

Variable deduction

Principal component analysis (PCA) is a statistical method used to reduce the variables in a dataset. It does so by picking out highly correlated variables and “clumping” them together. Obviously, this comes at the expense of some accuracy, but it has the benefit of fewer inputs, which is important for generalization, trading speed and the algorithms you choose. SVMs work better with limited inputs. 

Consider a dataset with two variables and suppose you want to simplify it. You probably will not see a pressing need to reduce an already succinct dataset, but let’s consider this relatively rudimentary example for the sake of simplicity. 

The two variables are:

  1. Dow Jones Industrial Average (DJIA), a stock market index that constitutes 30 of America’s biggest companies.
  2. S&P 500 Index, a similar aggregate of 500 stocks of large American-listed companies. It contains many of the companies that the DJIA comprises.

“Related markets” (below) is a plot of their daily price changes. In theory, PCA will allow us to represent the data along one axis. This axis is the principal component, and it is represented by the black line.

In reality, you would not use PCA to transform two-dimensional data into one-dimension. Rather, you simplify data of higher dimensions into lower dimensions. Reducing data to a single dimension also reduces accuracy. This is because the data do not neatly hug the axis. Rather, they vary about the axis. But this is the trade-off.

Building trading models

The first step to building a model is picking our inputs. One of the classic ways for picking inputs is to use scatter charts. Let’s look at S&P 500 earnings vs. S&P 500 prices shifted forward one month (see “Following earnings,” below). Several different patterns are worth noting. Both a non-linear “u” shape and linear relationships are useful when selecting inputs for neural networks. Here, we have a curve linear shape. 

One area of neural network research from the early 1990s is developing neural network models using so-called hints. That means we add domain expert-based inputs, either based on rule-based trading systems or discretionary models, to guide the neural network models in more robust ways. For example, we would add an input based on a rule system with inputs -1, 0 or 1 for long, short or flat. If the systems used for hints are robust and profitable, they will force the neural network to learn a profitable path from that data. 

A divide and conquer algorithm is a common approach. It works by recursively breaking down a problem into two or more sub-problems that are related (divide), until these become simple enough to be solved directly (conquer). 

Here’s the process:

  1. Instead of training one neural network, we will develop multiple neural networks with each being trained on a subset of the data, which breaks the data space into different market regimes. 
  2. Split with domain expertise, such as one net for a trending market and one for a range-bound market. 
  3. Use rough sets to break up the space. 
  4. Breakout higher based on volatility. 

We will then train a neural network for each subset of data and use the network that applies to the new data as income. We also might train one neural network on data from when the market is above the 200-day moving average and another one from when it is below.

Post-processing

Neural network post-processing is an important step to practically using models we have built. This process is often used to reduce systematic errors, such as using the current error in the neural network and trying to model that to reduce the general error in the neural network forecast.

Although the current research on post-processing is thin, it is one of the more important areas of practical research with respect to neural networks and the markets.

In this installment, we discussed data preprocessing and model-development strategies. The next step is to build actual models for trading the markets.

About the Author

Murray A. Ruggiero Jr. is the author of "Cybernetic Trading Strategies" (Wiley). E-mail him at ruggieroassoc@aol.com.