The “magic” of back propagation, or backprop, is that mathematical calculations (the type typically found in first-year calculus) adjust the weights of the connections to minimize the error across the training set. An important attribute of these methods is they generate a reasonably low error across the training set of inputs. However, they do not find the absolute minimum error, but the local minimum. This means that training a neural network is not exact and depends on the precise data set. Repeating the same experiment does not always give the same answer.
Backprop, in its original form, had a lot of issues. Many variations of this algorithm attempt to resolve those weaknesses. Early ideas used momentum and variable learning rate adjustment techniques, such as simulated annealing. When newer tactics are combined with older ones, the combination can optimize learning. For example, we perform batch learning in parallel so that we can run it on multiple cores, saving a tremendous amount of time. All of these variations are supervised learning algorithms: We give them input patterns and train them to output a certain target set of results. In doing so, we map the patterns, which in turn allows us to generalize for new patterns that were not used in training.
There are other algorithms, such as radial nets and kernel regression (also known as Support Vector Machines). All of these algorithms can be used to create approximations of non-linear functions. This approximates how neural networks map a given input to an output. Put simply, we create a universal function “approximator” that, given a set of inputs, can provide a good idea of what the optimal solution to a problem would be.
As with most things, interest in neural networks took off when the customer started demanding it. Traders, hungry for the next big thing, were clamoring for the technology in the early 1990s. However, the vast majority of these traders had no background in the processes — and those who had the background knew nothing about the markets or how they work.
But neural networks were not the perfect solution, and after many years of trial and error, it became clear why: Standard neural-network-based signal processing techniques simply do not work in the markets as signal generators. In other words, the process of implementing neural networks correctly must begin far earlier in trading system development. You must create systems that work well without neural networks for neural networks to be able to improve them.
However, this knowledge was not common in the early 1990s. Many large institutions brought in neural network expertise that used methods of building models that did not incorporate domain expertise (market knowledge). Errors of this type were prevalent even among large banks with huge budgets.
In one specific case, a large bank had an in-house team that developed a trading model using deterministic models, such as the Mackey Glass equation, to simulate data. Mackey Glass is a time-delay differential equation that can generate a curve that looks like a stock market price series. The equation most often is used to model many biological processes, such as white blood cell circulation. Expertise in these methods was used as a substitute for market knowledge. They failed completely.