## Friday, January 29, 2010

### Practical Implementation of Neural Network based time series (stock) prediction - PART 1

The following introduction is to allow viewers to understand the basic concepts and practical implementation of neural nets towards a financial time series. I will not go too deep into detail about the mathematics behind the neural net at the moment. My goal is to get you to understand practical details about how to actually implement a neural net using simple tools and models. We will start with a simple model to understand a basic time series. The time series waveform is a simple sine wave with the period set to 30 days. It is implemented in excel as a source file to be processed in any Machine Learning capable software. For this example I will be using a very good GUI Java based program called Weka.

Fig 1. Shows a simple sine wave set to a period (T) of 30 days.

It is a very simple time series based upon the well known sine wave model.
We can see that one complete cycle occurs over a period of 30 days. Each time step is set to 1 unit or day per step.

Fig 2. A complex sinusoidal signal with f1 set to 1/T, where T=30 days.

Anyone who has worked with financial time series knows that they can be far more complicated than simple sine based models, however, it is often better to learn from basic principles and move up in complexity in order to have a good grasp of what we are doing. The second figure is a bit more complicated as it is the sum of three different sin based signals. Each signal has a different Amplitude and Frequency associated with it. We could use Fourier Analysis to show the spectrum of the three different tones if we wished. However, for now we'll just accept that it is a complex signal. Notice one property of this signal that is also a bit optimistic is that it is a stationary signal. Essentially a stationary signal has statistical properties that do not change over time. For example, if we were to sample the average from different slices, it would not change much. We also can visually see that the time series is mean reverting. Financial time series differ in that they are not stationary, but are typically unit root and must often be transformed in order for the neural network to process them. The purpose of the complex signal, however, is to show how we can move to an increasingly complex signal from a very simple model.

Fig 3. Normalized Complex Signal

The final step is to simply normalize the time series to be constrained between the vertical (what we call rails) range of minus 1 to plus 1. A typical neural net is limited by an internal function, sometimes called a squashing function. The function is a non-linear processing function that is often made up of a sigmoid or tanh (hyperbolic tangent) function, which saturate at (0,1) and (-1,1), respectively.
A simple transformation can be produced by xnew =xold*(vmaxn-vminn)/(vmaxo-vmino).
Vmax and Vmin are the new and old maximum values of the time series. In this case we will use -.9 and +.9 as the limiting rails so as to avoid saturation effects. Often software will do the normalizing for you. In the case of Weka, you can choose to have it do this operation for you, in which case no normalization is neccessary. Although we should understand it for future reference.

That's it for part I. Next we will investigate how to transport the data to Weka and have it build and predict the out of sample signal set!

Please add any comments on where I can improve my tutorial as I am new to the blogger scene and appreciate any feedback.

1. Thanks for the demonstration. I am looking forward to seeing how Weka can be used to predict stocks.

2. Thank you for putting this up.

"...My goal is to share some concrete examples for the layman to be able to build and replicate...."

in your "about me" I thought: "That's for me." But my level seems to be even below "layman". :-)

Because of you ;-) I had to find out how to get a sine wave within OpenOffice Calc (=SIN(A2*PI() / 180)) and how to graph it. Sin, cos, tan? That's too long ago. But I managed to get to the first step. Unfortunately my sine wave with 30 periods doesn't look like yours. I have different values and more cycles. It probably has to do with this "fo" value of 0.033333 which I see in your figure1. That's how far I got. Don't laugh! :-)

Now don't ask me how you calculated the Complex sine wave in your second figure.

Is there a way for you to post your XLS file or quickly list the formula you used in those cells?

Thanks

3. Thanks for responding; one of these days I'm going to have to figure out how to get a bit more organized, attach files, and or create a repository.

The sine wave is defined as signal = sin(2*pi*fo/fs), where fo=1/To , where To is the period you want.
In the example, To=30 days.
fs=1/Ts where Ts is the sample interval in column A.

so H2=30 (length of period)
I2 =1/H2 (fundamental frequency)
col A = 0 to length of signal
col B = sin(2*p*$I$2*colA(t)) t is just corresponding cell number.

The complex sines are just built using summation of series with different fundamental tones, and some arbitrary amplitudes. Amplitude is the coefficient of the signals.

Apologize if it came across complex, I'll make an effort to label things if possible. Also, sampling theory is not covered in basic trig, so don't feel bad. Although it is based on trig, the concepts are borrowed from digital signal processing theory.

4. Beautiful. Thanks!

Now my sine wave looks more like yours and the values in your table seem to correspond with yours. Except that at T 15, 30, 45, etc. I have a value of "0" while you have other values in those places. Probably not too important.

Now I will try to figure out how to "build a summation of series" to create a complex sine like yours.

Even if this is basic (or even below basic) to me this is already tremendously helpful. I'm looking forward to start to use WEKA and other tools on time series and learn even more.

Thanks!

5. Good Job. Those values are called nulls and should be zero. My values are so small, they are approximately zero, although sometimes numerical precision errors show up so as to appear slightly different than zero. The same issue will occur using Weka. Glad to hear you were able to replicate. Even if your signal was slightly different, however, it should still work for the tutorial.

For the summation, just create two more series, and randomly change the fo values, like 1/To*2 for example, and the amplitudes, such as 5*series1 +2.5*series2... You can play with these values and the graph in excel will automatically update, so you can see the complex siganl being created.

6. This starts to be fun!

Originally I only wanted to learn a little bit more about how I could (maybe) use machine learning in trading. Now I end up learning a little bit more about Matlab (well at the moment it is Freemath).

I couldn't find a good example of how to sum two or more sine waves in Excel (I thought this is much more complex) I ended up finding an example for Matlab. Looked pretty simple to do but that meant that I first had to learn a little bit more about Matlab. But as Matlab is too expensive in order to just draw once in a live time some simple sine waves I chose Freemath for the job.

If somebody with the same (low math) knowledge level comes after me to your blog I hope that your explanation above and the Matlab example I found (http://www.owlnet.rice.edu/~elec241/matlab.html) will help them further.

Moving on to the next step...

7. thanks
plz
for problem
for my article about stock prediction using decision tree
thanks alot

8. Hi baraa,

I'm not exactly sure what article you are writing. You can email me with any relevant information/questions at intelligenttradingtech@yahoo.com

Cheers.

9. i want to tell me about how i can prepare the data set in excel sheet before enter it to weka what did you do to make it like this sheet

10. baraa,

Thanks for all the comments. I've tried to outline the steps as detailed as possible. However, I should work on making it more clear for some of the audience. I will try to get to your personal emails if I get a chance.

11. hi thank you but i understood every part but the problrm just in how can i specify the bias you set s-1 s-2 s-3 s-4 but how just explain to me in comment jut plz i am waiting you plz

just explain how can i prepare just the table in your part2 that contain bias and s1-4 and the main signal cause i want to continue really i benefit from your part 1 and part 2 and the use of weka thank you very much and i will not take your time let me know in small comment about prepare bias table

12. Thanks for all the comments and please keep them coming for any areas that are unclear.

baraa,
Hi again. The bias is one column of the data that is constant (all ones). The other values, s-1, s-2,... are called embedded or lagged values.

Suppose you had a main signal that was:
signal defined at time slots
time(signal)
5
6
7
8
9

then the s-1 or lagged one signal would be
4
5
6
7
8
s-2 or lagged two signal is
3
4
5
6
7

What you are doing is simply creating a time lagged version of the main signal you are going to process. Each value of s-n, is the number of lags you use for that column. The only other columns are bias (all ones) and the signal itself.
Hopefully, this makes sense.

13. is this prediction for one forwarded day between actual and predicted

14. baraa, in this case yes.

I did receive several of your posts, but I'm going to limit them as I don't want to bog down the comments.

I suggest you get in touch with the nabble help group for weka, as they have full time moderators available to guide you through the process step by step.

In addition you should pick up the book on weka I recommended and try to walk through some of the examples.