Friday, October 14, 2011

Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig

Just wanted to notify viewers of a few great courses that are being offered free for auditing and/or participation by well known industry experts, including co-author of the classic text on AI, 'Artificial Intelligence: A Modern Approach,' Peter Norvig and Prof. Andrew Ng.

http://www.ai-class.com/
see also,
http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2011/10/14/BUFR1LH9JR.DTL

The notice is a bit late, but they are still accepting registrations.

Thursday, October 6, 2011

Spatio-Temporal Data Mining: 2

There are many visual methods used to identify patterns in space and time. I've discussed some in prior threads and will show a few others briefly here. One of the most difficult questions I often hear from others regarding markov type approaches, is how to identify states to be processed.

It is a similar problem that one encounters using simple linear type factor analysis. Unfortunately, there is no simple answer; however, because data streams are becoming so vast it becomes almost impossible to enumerate over all possible state sets. Visual mining techniques can be incredibly helpful in narrowing down that space as well as feature reduction.  I often use these types of visualizations back and forth with unsupervised classification type learners to converge on useful state identifications.


                                       Fig 1. Spatio-Temporal State plot

Figure 1 gives an idea on visualizing states with respect to time. But having such knowledge in isolation doesn't give us much use. We are more interested in looking for Bayesian type relationships between states that give some probabilities between linked states in time.


                                              Fig 2. Fluctuation Plot

Several visual methods exist to capture the relationships visually. One common plot used in language processing and information theory, is a fluctuation plot. The above plot was built using the same state data as the first graph. It is often used to determine conditional relationships between symbols such as alphabet tokens. The size of each box is directly proportional to the weight of the transition probabilities between row and column states in tabular data. An example would be to think of the letters yzy more commonly followed by g (as in syzygy) than any other state token; thus, one would expect to quickly spot a larger box across a row of states representing the 'yzy' row token n-gram and 'g' column token .

Both plots were produced in R.  ggflucuation() is a plot command utilized from ggplot2.  I am currently investigating how much easier and faster it might be to process such visualizations in tools like protovis and processing.  I've been especially inspired by reading some of Nathan Yau's excellent visualization work in his book, 'Visualize This.' I included it in the link to the right for interested readers.

Friday, September 23, 2011

Arc Diagram and spatiotemporal data mining visualization

I won't spend too much time discussing this fascinating topic other than to say it relates very much to prior discussions about pattern discovery via visual data mining (see lexical dispersion plots for example).  I happened across an interesting visualization method called the Arc Diagram, developed by Martin Wattenberg. Working for data visualization groups at IBM and later Google, he developed some interesting visual representations of spatiotemporal data.



Fig 1. Arc Diagram and legend with example of discretized pattern archetype.

The resulting plot generates some fascinating temporal signatures, similar to what one might see in  phase-space portraits from chaos. However, they have been frequently utilized to look for spatiotemporal signatures in music.  One might discern a type of underlying order or visual signature of complexity as well as recurring patterns in sequential objects ranging from text based lyrical information to musical sheet notes.

 Figure 1 shows an example of how one might utilize this tool towards temporal pattern discovery in time series. A weekly series from SPY has been discretized into alphabet tokens, based upon the bin ranges in the included legend. The small chart in the example would decode an archetypal pattern for the following sequence: ECDCECCD, into a time series representation of the 8 week data symbol. The following interactive java tool from another blogger, Neoformix, was then used to translate the data into an Arc Diagram.  http://www.neoformix.com/Projects/DocumentArcDiagrams/index.html  .  Read from top to bottom, one can look at recurring and related patterns that are repeated over time; certain behavior might warrant further investigation.

You can copy the following data stream into the tool to toy around with the tool to get a feel for the possibilities of visual pattern discovery.*  I won't go into too much more detail about utilizing it, other than to say it appears to be a very useful tool in temporal based pattern discovery.

Please see the following for more ideas on arc diagrams and musical signatures:
http://www.research.ibm.com/visual/papers/arc-diagrams.pdf

http://turbulence.org/Works/song/mono.html

Blog mentioned:
http://www.neoformix.com/

* Not sure how to attach .xls file here, but if anyone wants a copy of the .xls file, you can send me an email and I'll try to get it out to you.  Otherwise, you can simply grab a song lyric off the web to play with the tool.