Algorithm bias: A statistical review
Finally, combining the priors determined in the second set of equations and the likelihoods in “Model likelihoods,” we can calculate the posterior probability of each model, given the data. Recall that P(model|data) is proportional to prior × likelihood. The calculations are given in the equations below, where c2 is a constant to ensure that the posterior probabilities sum to one.
Given the data, and our assumptions (in the prior), the equations above show us that the second and third models are the most probable, with the third model having the highest posterior probability. The application of the third model parameterized as in the regression equations to our out-of-sample data generates 14.24% per annum before costs, as shown in “Model profits”.
We chose the most probable model, but we can do better than that. It is optimal to take an average over all models, with each model’s prediction weighted by its posterior probability. This is known as Bayesian model averaging, and increases our profit to 15.65% return per annum, before costs, as shown in “Model profits.”
OUT OF SAMPLE
The trading community typically worries about avoiding overfitting and statistical significance, but our practical successes have been due to the appropriate application of bias. Everyone should be a Bayesian, and use domain knowledge to make intelligent and explicit assumptions and adhere to the rules of probability, because how aligned your learning algorithm is with the domain determines how well you will generalize.
What does this mean in practice? A market price is generated by a non-stationary, stochastic, discontinuous and probably non-linear dynamic process, and any useful (that is, profitable) signal is extremely noisy. The resulting time series approximates a martingale, which makes prediction extremely difficult.
A profitable trading system is therefore surprising, and as such requires ample evidence. Moreover, as outlined above, the efficient market hypothesis implies that a simple signal should not persist, while the low signal-to-noise ratio dictates that a complex signal should not arise. We should therefore seek models of intermediate complexity.
Martin Sewell is a senior research associate at the University of Cambridge.