On the level
Mark Jurik’s data pre-processing method assumes the price series is formed by cycles of different frequencies. If data were sampled at different frequencies, the samples would carry all the information in the series.
To do this, we use sample blocks of data. If a block is further in the past, it is spaced further from the next block and is larger. The index determines how far back in time the center of the block is situated. This index is chosen such that it covers the period between the consecutive blocks. The indexes are provided as shown:
Row 1 = n and Row 2 = m
Row 1 = 1 2 3 4 5 7 9 13 17 25 33 49 65 97 129 193 257 385
Row 2 = 0 0 0 0 0 2 2 4 4 8 8 16 16 32 32 64 64 128
This strategy provides the neural network with the information it needs to look back in time without sampling every bar. For example, if we believe the price of gold affects the 10-year Treasury note for up to 50 bars, we would use the sample for row “n” and there would not be 50 columns of inputs. We would sample the first five days, and then our samples would become further apart because the further separated samples are trying to put up longer-term cycles. If we are trying to find a 30-day cycle, it can be reproduced sampling every five days without needing to sample every day.
Level-0 features are the normalization of price and the exponential moving average of price. These are sampled using row n in this table. Level-1 features are normalized price change relative to a block moving average.
Level 0 feature formula:
Level 1 feature formula:
This canned pre-processing works well for predicting moving average oscillators when you include past values of the output target sampled as well.
Developing custom pre-processing requires a deep domain expertise as well as data mining skills. A reasonable test is to display them against our target in scatter charts. We look for these scatter charts to show patterns, either linear or non-linear shapes. What we don’t want to see is a defined blob.
We also will use approximation paradigms such as rough sets and machine learning algorithms such as C4.5 that can judge the information content of the pre-processing we develop.
Murray A. Ruggiero Jr. is the author of “Cybernetic Trading Strategies” (Wiley). E-mail him at firstname.lastname@example.org.