Wednesday, April 28, 2010

Wavelet Spectrogram Non-Stationary Financial Time Series analysis using R (TTR/Quantmod/dPlR) with USDEUR

I've been doing some research lately regarding types of spectral imaging and decomposition techniques that apply to non-stationary signals. As mentioned earlier, one of the major problems with the simple fourier analysis is that the basis functions extend to infinity in both directions and the signals are assumed to be stationary. Although, I won't expand too much right now, one of the advantages of wavelets is that they use local small windowed basis functions, allowing them to capture not only non-stationary signals, but signals that are aperiodic: two large advantages over fourier based methods when dealing with financial time series.

I put together a few small examples to understand how to visually understand a spectrogram.



Fig 1. Simple 58 day cycle captured with 11 octaves and 2048 (2^11) data points

As in earlier tutorial based posts, we use a simple 58 day cycle to show the basic time series sine based waveform. Now the plot on the bottom is known as a spectrogram. The type of wavelet operation for this spectrogram is known as a continuous wave Morlet transform. The package is dpLR (The Dendrochronology Program Library) put together by Andy Bunn . The package was designed to analyze tree rings. Notice that there are a multitude of tools utilizing this type of technology, ranging from MRIs, to climatology, to speech processing. It is, IMO, the modern day version of dft type spectral tools (however, for non-stationary and aperiodic signals). Now looking on the spectrogram plot, please keep in mind the units are Days not Years (I need to see how to alter that, hopefully Dr. Bunn is listening=).

The time scales represents linear time, or a window of 2048 days that was sampled. We could have used any time series, but it needs to be length=2^N; if not, there is a function to pad the rest of the data with zeros to make up that length. The vertical scale is a log scale that shows what are called 'octaves'. Borrowing from musical vernacular, we can think of them of scales which double in magnitude for every prior scale and represent localized frequency energy information at such scales. The colors represent the heat or power of the signal in regions of interest. Due to some issues with this transform, we ignore uncertain information outside of the dark parabolic region (cone of influence). It is clear that the highest power is the dark red region right at around 58 days. What is important here is not so much to understand the exact value of the cycle, but the persistence in the dominant cycle (s). We notice the cycle persists throughout the entire spectrogram Time Series length (much as we would expect from the 2D time series plot).

What happens if we use different frequencies that change over time? Here we notice a clear advantage over fourier based methods. A fourier based decomposition would be able to locate the dominant tones, however, because it uses infinite bases, the reconstructed signal would not capture the isolation of different frequencies.



Fig 2. Composite Stationary Time Series comprised of 3 dominant tones

Notice, that we can clearly see the regions of dominant tones by following the chart and looking for the most concentrated power (red) regions, which are around 48, 253, and 532 day cycles. We also notice that the power density can be viewed in terms of time context, our eyes simply follow along in time and observe strong regions of signal energy concentration.

Ok, but what about if the signal itself is non-stationary?



Fig 3. Composite signal added to exponential curve to make signal non-stationary

Notice, that even though we now have a non-stationary signal, the regions of underlying cyclic component stability are still detectable by eye!

Lastly, a financial time series of USDEUR was captured via TTR/Quantmod packages.



Fig 4. USDEUR time series spectrogram

Notice even with the non-stationary financial signal, there is a very clear dominant cycle pattern that is persistent at roughly 255 days (anyone familiar with trading recognizes that as the approximate number of trading days per year).

Keep in mind that there are also aliases (and spreading) present in sampling methods which may look like periodic signals, but are merely digital artifacts of the underlying sampled signal. We also see the very short term noise present in the bottom lower scales.

Another interesting application of this is that it may not only be used as a modern tool to augment non-stationary decomposition, but for those familiar with pattern based techniques, it (and the periodogram counterpart) is often used in pattern recognition and markov type modeling.

That's all for now. Hopefully, you have gained some appreciation for wavelet based spectral techniques vs. Fourier spectral based analysis.

I have been debating whether to break up the post, but because I was added to the R bloggers thread, I wanted the post to be complete for local readers.

That's it for now.

Saturday, April 3, 2010

Why isn't my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2

I created an example to show how the theory from part 1 might be applied using S&P500 as a proxy for performance. Just in case anyone viewing is not familiar with terminal wealth, it is the final (usually compounded) ending value (hence, terminal) of the account.



Fig 1. Example of S&P 500 and using GBM monte carlo simulations for terminal wealth

A monte carlo simulation of GBM, using historical daily%change parameters(mean,std), was run for 10,000 iterations of a time series length=1000. The length was chosen to approximate slices of about 3yrs for summary statistics of terminal wealth (a good approximation for market timing). I also used the long term historical mu and std of the series, although it might be a bit biased towards longer horizons. Possibly, I could generate more of a 3yr sampling distribution of N(u,std), for more relevance, but for now we'll assume the long run parameters are a good approximation.

Graphical summary statistics using boxplots and density estimates are shown for the monte carlo simulations. What strikes me at first glance, is that the -2x instrument performs absolutely horrible in most cases, adding to the common knowledge that markets have upwards drift. If you are ever stuck holding a position, just hope it isn't short (we've all experienced the deer in the headlight phenomenon at one time or another); statistically, it is not the best side to be stuck on for any long period.

Another more interesting observation, however, is that the simple 1X underlying instrument mode is to the right of all the density estimates. In addition, you are clearly taking on wider variance/risk, by using the positive (and neg) leveraged 2x instrument. In essence, you are seeing some of kelly principles at work here. By taking on 2X risk, while you have a chance of larger gains, statistically, you are not likely to do too much better than 1x, while taking on far greater risk on the negative side.

Lastly, there are two sample slices shown of the actual results, using arbitrary periods of performance. It is clear, that during periods of long trends, we have much better growth in the 2X instrument, unfortunately, we don't know when those trends will occur, and secondly, according to the monte carlo sims, they are not that likely to occur.

The most recent performance, displayed, is a perfect example of a series where both 2X instruments performed worse than the underlying, as explained in part 1.

Below is a summary of the three series, ser(1X), ser2pos(+2X), ser2neg(-2X)
> summary(ser)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3613 1.0800 1.3290 1.3870 1.6250 4.7460
> summary(ser2pos)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1178 1.0630 1.6100 1.9180 2.4070 20.5700
> summary(ser2neg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0337 0.2859 0.4279 0.5173 0.6483 5.6480

Notice the Median of +2X is nowhere near 2 times the Median of the underlying. Although 2X has some fantastic outliers, you shouldn't expect them statistically.
It's sort of like tossing a coin with compounding the full amount, whereby, you get a fantastic result for the winning outcome, unfortunately, there is a 75% probability of going bankrupt (maybe I'll cover that one another time).

One final comment is that the monte carlo sims used GBM, whereas a more likely jump diffusion process would create much fatter tails, meaning even more neg tail risk against the potentially nice looking 2X instrument potential gains.