Friday, September 23, 2011

Arc Diagram and spatiotemporal data mining visualization

I won't spend too much time discussing this fascinating topic other than to say it relates very much to prior discussions about pattern discovery via visual data mining (see lexical dispersion plots for example).  I happened across an interesting visualization method called the Arc Diagram, developed by Martin Wattenberg. Working for data visualization groups at IBM and later Google, he developed some interesting visual representations of spatiotemporal data.



Fig 1. Arc Diagram and legend with example of discretized pattern archetype.

The resulting plot generates some fascinating temporal signatures, similar to what one might see in  phase-space portraits from chaos. However, they have been frequently utilized to look for spatiotemporal signatures in music.  One might discern a type of underlying order or visual signature of complexity as well as recurring patterns in sequential objects ranging from text based lyrical information to musical sheet notes.

 Figure 1 shows an example of how one might utilize this tool towards temporal pattern discovery in time series. A weekly series from SPY has been discretized into alphabet tokens, based upon the bin ranges in the included legend. The small chart in the example would decode an archetypal pattern for the following sequence: ECDCECCD, into a time series representation of the 8 week data symbol. The following interactive java tool from another blogger, Neoformix, was then used to translate the data into an Arc Diagram.  http://www.neoformix.com/Projects/DocumentArcDiagrams/index.html  .  Read from top to bottom, one can look at recurring and related patterns that are repeated over time; certain behavior might warrant further investigation.

You can copy the following data stream into the tool to toy around with the tool to get a feel for the possibilities of visual pattern discovery.*  I won't go into too much more detail about utilizing it, other than to say it appears to be a very useful tool in temporal based pattern discovery.

Please see the following for more ideas on arc diagrams and musical signatures:
http://www.research.ibm.com/visual/papers/arc-diagrams.pdf

http://turbulence.org/Works/song/mono.html

Blog mentioned:
http://www.neoformix.com/

* Not sure how to attach .xls file here, but if anyone wants a copy of the .xls file, you can send me an email and I'll try to get it out to you.  Otherwise, you can simply grab a song lyric off the web to play with the tool.

Thursday, August 4, 2011

Aug 4, 2011 "plunge" headlines are in the air tonight

Today's financial headlines are littered with the word 'plunge.'  Considering today's (cl-cl) drop on the S&P500 was just about -5%, I don't know that I would exactly call that a plunge.


                      Fig 1. Historical ts plot of S&P500 returns <= -5%

The following R code produced a time series plot of historical occasions where this occurred.

###################################################

library(quantmod)

getSymbols("^GSPC",from="1950-01-01",to="2012-01-01")
adj<-GSPC$GSPC.Adjusted
rtn<-(adj/lag(adj,1)-1)[2:length(adj)]
r05<-rtn[rtn<= -.05]

plot(sort(r05),type='o',main='S&P500 1950-present returns <= -5%')

###################################################
Although the frequency of such occurrences is  arguably rare, the 1987 drop is much more worthy of the 1 day label 'plunge.'

One other disturbing observation in the data, however, is the large temporal clustering of occurrences in the recent 2008 region.  Now that's behavior to be concerned about (not to mention revised flash crash data pts.).

filtered 1 day cl-cl returns <=-5% sorted by date

Thursday, July 28, 2011

Pattern Recognition: forward Boxplot Trajectories using R

Although the following discussion can apply to the Quantitative Candlestick Pattern Recognition series, it is addressing the same issue as any basic conditional type system -- how and when to exit.  The following is one way to visualize and think about it, and is by no means optimal.



                                    Fig 1. Posterior Boxplot Trajectory

Often we attempt to find some set of prior input patterns that leads to profitable posterior outcomes.  However, in most of the available examples, we are typically only given heuristics and rules of thumb on where to exit.  This might make sense, since no one can accurately predict where to exit. However, with knowledge of past samples, we can have some idea of where a good target to exit might be, given the prior knowledge of forward trajectories.  I dubbed the name 'boxplot trajectory', here, as I think it's a useful way to visualize a group of many possible outcome trajectories for further analysis.

In this example, a set of daily price based patterns was analyzed via a proprietary program I wrote in R, which resulted in an input pattern yielding a set of 52 samples that met my conditional criteria.  Fig 1 illustrates a way to look at the trajectory outcomes based upon one of the profitable patterns in the conditional criteria. The bottom graph is simply the plot of median results of each data point in the trajectory. We often try to imagine the best way to exit without foreknowledge of the future (and somewhat less rule of thumb based criteria).

                                      Fig 2. Trajectory tree.

One approach would be to use some type of exiting rule based upon the statistical median of each sequential point's range.  Knowing that 1/2 of the vertices occur above and 1/2 below the median, we should expect to hit at least 1/2 of the targets at or above the median. Given that the 3rd point is the highest median, it makes sense to exit earlier than waiting for a greater gain further out (which has an even lower median).  So if we take as a target, the median value of the 3rd pt. we achieve an average and fixed target of 1.59% on 27/52 of the total samples.

Of the remaining samples, we may now wish to exit on the 11th bar (or earlier if the same target is hit earlier) target of .556%, which is achieved on 13/52 of the remaining samples.  This leaves only the last bar of which we simply use the average return as the weighted return value for that target, in this case -1.74% for the remaining samples : 12/52. Notice we will always have the worse contenders that were put off until the end.

The expectation yields E(rtn)=27/52*.0159+13/52*.0056-12/52*-.017 =.0057
eeking out a small average + gain of .57%. Compounded, this gives:
(1+.0159)^27*(1+.0056)^13*(1-.017)^12~ 34% rtn for 52 trades, each less than 3 days in length.  Hit rate (as secondary observation) is 77% in this case.

The approach is particularly appealing for a high frequency strategy with very low commissions. Notice it's by no means comprehensive (and yes, I've only shown in sample here), but rather a novel way to think about exiting strategies.