MULTI-VARIATE FINANCIAL INDEX PREDICTION - A NEURAL NETWORK STUDY
Colin G Windsor and Antony H Harker

B521.2, Harwell Laboratory, Oxon. OX11 ORA

From the IEEE International Neural Network Conference, Paris, Kluwer Academic Publishers, pp 357-360, 1990.

ABSTRACT

The annual movements of up to 14 financial indices have been predicted simultaneously using a back propagation network. Expanded functions of deviations from a linear fit to the logarithms of the indices were presented to the network in order to bring out the structure in the training data. Multiple targets were used, one for each index. They were calculated using a sigmoidal function whose value varied from 0 for a sharp fall to 1 for a sharp rise, giving a high sensitivity for changes around the linear regression. The number and weighting of the target indices was varied to optimise the prediction accuracy of a given index. Estimated statistical errors on each index were used to enlarge the size of the training set, and to estimate the reliability of the prediction. The predictions compare favorably those from multiple linear auto-regression. The parameters of the network were chosen to optimise the prediction accuracy during a series of years. The best results were obtained with relatively small networks having around 3 indices, presenting around 3 years simultaneously, and using about 4 hidden units. The trade-off between accuracy in assimilating the training data and accuracy in prediction is explicitly demonstrated.

1. THE NEED FOR A MULTIVARIATE APPROACH

The full line in figure 1 shows the annual movements of the UK ordinary Share Index. Its fluctuations derive partly from the random nature of the individual transactions which generate its movements, and partly from political decisions which change market conditions. However, seen on this annual scale, the statistical fluctuations are minimised and much of its structure can be interpreted. The dashed line in figure 1 shows the Interest Rate superimposed on the Share Index. For a given year, the two indices are negatively correlated, as seen most clearly in 1974. However when displaced by a year, there is a positive correlation as shown inset. The political environment is suggested by the dotted curve showing the Conservative Majority. The election years correlate the peaks in the share index. Such correlations can help to predict index movements.

Figure 1. The normalised logarithms of the Share Index (full line), and the Interest Rate (dashed), are shown on the left scale. Their cross correlation is shown inset The Conservative Majority is shown dotted.

2. A MULTI-VARIATE TIME SERIES BACK PROPAGATION NETWORK.

The chartist approach to financial index prediction is to seek patterns in the past movements of one or more indices, and to make predictions by searching for the closest match between these patterns and the current situation. The standard back propagation method (2) can be applied in such as way as to reproduce this procedure In the present tests, annual movements of up to 14 indices have been taken from the Economist publication Economic Statistics 1900-1983 (1). The set of indices used includes the UK Share Index, Interest Rate, Ml Money Supple, Gross National Product, Personal Disposable Income, Balance of payments, Conservative Party Majority, Savings Ratio, Birth Rate, Unemployment and the US Balance of Payments and Dow Jones Index. The network is presented with functions of the movements over a training period up to a prediction year, y_p. This period, chosen here from 20 to 25 years, should be long enough that it includes the most typical movement patterns, yet not too long that the training conditions are no longer relevant. The tests have been made on a series of 14 prediction years from 1968 to 1981. Tests on the accuracy of the predictions P_i,y for each index i, having values I_i,ya series of years y, were monitored using the fitting parameter

H_i= [S (P_i,y-I_i,y)/n_y] / S (I_i,y-I_i,y-1)/n_y]. (1)

The standard back propagation method is easily adapted to make predictions on n₀ indices, given values of n_iindices over a training period of n_l years before yp. As illustrated in figure 2, for each training year y_t , the network is presented with functions f(I_i,y) of the indices for the year y_t, the previous year y_t-1, and so on including n_y presented years. Typically n is 3, as in a Box Jenkins approach with information on an index value, slope and curvature. The n_y.n_c inputs are fully connected to n_h hidden units, and these are in turn connected to n₀output units. Scalar targets are set for each of the n₀ chosen indices to reflect their changes during the year. The sigmoidal form was chosen so that the targets T_I,y had a value near 1 for a rapidly rising index, near 0 for a rapidly falling index, and near 0.5 for small changes where maximum sensitivity is required.

T_i,y= 1/ [ 1 + exp{ (f (I_i,y) - f (I_i,y-1) ).W_i] / S. (2)

The constant S controls the steepness on the sigmoidal function (as does the temperature in the neuron activation function). The value S=0.25 was used in this work. The network is thus forced to predict not only the index to be predicted, but also other chosen indices. As in computer models of the economy, the network is forced to interpret the index movements as a whole. The index weights W_i were chosen to maximise the network performance for the index being predicted. For example if this is the Share Index, its weight might be set to 1, while that of the Interest Rate might be set to 0.4, and that of the Money Supply Ml to 0.2.

Figure 2. A Back Propagation network for annual predictions. Sets of 3 years from a 25-year training period are presented to the net The year after the last training year is the prediction year The method is tested using 14 predictions years from 1968 to 1981. Three years values for each index form the inputs to a fully connected net

3. OPTIMAL BACK PROPAGATION INPUT AND TARGET TRANSFORMATIONS.

The data presented to the network was transformed so that much of the prediction was performed by standard statistical transformations. For example the Share Index over the period of Figure 1 shows a generally exponential growth. It is a waste of adjustable parameters to train the neural net to follow this variation. Improved predictions occur if the network is required to predict the deviations from this exponential growth. Each index i was transformed by taking logarithms, and fitting a regression line R_i,y= a_i + b_ilog I_I,y. A further transformation defines a logarithmic deviation from the regression line with a fixed standard deviation D depending on the RMS deviation c_i of each index from the regression line

F( I_i,y) = (D R_i,y/ c_i). log [log (I_i,y) - log (I_i,y-1) /R_i,y] (3)

The constant D was chosen as 0.25 in this study. The same transformation was applied when evaluating the network targets, which describe the changes from the linear regression line.

The utility of this transformation was such that a converged fit to the training data could typically be obtained in 5000 presentations, rather than the 50000 required with no transformation. The speed of convergence was further improved by using the revision technique where presentation years were chosen statistically according to the current squared deviation from their targets(3). The presentation frequency of ill-fitting years is increased at the expense of years where a good fit had already been obtained.

Although annual financial indices may be available to several significant figures, their true precision is limited by their statistical fluctuations during the year. This may be estimated as say 2% for the Share Index, 5% for the Interest Rate and 1% for the Ml Money Supply. Training on precise annual figures is counter-productive. Here the number of training points was increased by a factor of 25 by generating new index values having a Gaussian spread around the precise values. The same spreads were applied to the indices during testing, and the standard deviation of the resulting predictions used to give a lower bound estimate of their reliability. Figure 3 shows the spreads in predictions of the Share Index from 1968 to 1981.

Figure 3. The predicted annual change in the ordinary share index (solid line with the spread indicating the predicted statistical uncertainty), compared with the actual changes (dashed lines). The prediction was made with the weighting: ordinary share index 1, interest rate 0.4, Money supply Ml 0.2. The network was presented with the current and two previous year values. It had 3 hidden units. It was trained with a learning rate of 4, a thermal energy of 1, and no momentum.

4. THE ACCURACY OF THE ORDINARY SHARE INDEX PREDICTIONS

Figure 4. The Share Index fitting coefficient (H_i in equation 1), during training (dashed line) and in prediction (solid line). They are plotted as a function of, (i) the number of presented years n_y, (ii) the number of indices n_Iand (iii) the number of hidden units n_h . All curves are for the full transformation with uncertainty predictions. Dotted curves are for multiple regression.

Figure 4 explores how the fitting coefficient H_ifor the Share Index, varies as a function of the network parameters. In (i) the fit is plotted as a function of n_y- the number of years presented simultaneously to the network. While the fit to the training data always improves with n_y the fit to the testing data shows a minimum at around 3. Too large a network is spoilt by detail; too small a net cannot see the patterns. In (ii) the fit is plotted against the number of indices treated simultaneously. The large multi-variate networks gave excellent fits to the training data but poor prediction performance. The best prediction was for only two indices, the Share Index and the Interest Rate. In (iii) the two index fit is plotted against the number of hidden units, with n_h=4 giving the best prediction. The overall best prediction is for n_y =3, n_i=2 and n_h= 4.

Also marked on the figure are some conventional statistical predictions. The line marked Erg on the left axis assumes that the index follows along its mean regression line. Len multivariate linear auto-regression (4), the movements of several indices are assumed to be predictable according to a linear function of the values, slopes, and curvatures of the indices up to the prediction year. Generally the method assumes a Taylor expansion involving n_y terms and n_i indices. Again a training period is used to define the best prediction coefficients so that the methods may be compared directly. The open circles in the figure shows the fit as a function of n_y and n_i. The trends resemble those from back propagation but the results are worse. No comparisons were made with expert system or economic models.

Thus by the use of multiple indices, the choice of index targets and transformations, and the optimisation of the network parameters from the prediction accuracy, a back propagation network is able to perform better than any of the alternative methods tried.

ACKNOWLEDGEMENTS

The author is grateful to A. T. Chadwick for many helpful discussions. The work was performed under the Underlying Research Programme of the United Kingdom Atomic Energy Authority.

REFERENCES

1. Economic Statistics 1900-1983, Economist Publications, London, 1985.

2. ParallelDistributed Processing, Ed J McClelland D E Rumelhart, MIT Press, 1987.

3. C. G. Windsor, nEuro 88 Conference, p 592, Ed. L Personnaz and G Dreyfus, 1.D.S.E.T., Paris, 1989.

4. R. H. Jones, Applied Series Analysis, Ed. D. F. Findley, Academic Press, 1978.