<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-107568321062020427</id><updated>2012-02-04T12:59:32.918-08:00</updated><category term='Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig'/><category term='Finally'/><category term='Genetic Algorithm Systematic Trading Development -- Part 1'/><category term='TTR  -Part 2: Parameter Sweep Sensitivity over long run'/><category term='FFT (Fast Fourier Transform) of time series  -- promises and pitfalls towards trading'/><category term='Baum Welch'/><category term='Genetic Algorithm Systematic Trading Development-- Part 2'/><category term='Practical Implementation of Neural Network based time series (stock) prediction - PART 2'/><category term='Classification for stock directional prediction'/><category term='MINE: Maximal Information-based NonParametric Exploration'/><category term='Practical Implementation of Neural Network based time series (stock) prediction  -PART 5'/><category term='Can one beat a Random Walk-- IMPOSSIBLE (you say?)'/><category term='Modified Donchian Band Trend Follower using R'/><category term='High Low Clustering on intraday high frequency sampled data'/><category term='Practical Implementation of Neural Network based Stock Prediciton'/><category term='Quantitative Candlestick Pattern Recognition (HMM'/><category term='Free Online Stanford Machine Learning Course: Andrew Ng. Post Mortem.'/><category term='A practical R book on Data Mining:  &quot;Data Mining With R'/><category term='Is it possible to get a causal smoothed filter ?'/><category term='Wavelet Spectrogram Non-Stationary Financial Time Series analysis using R (TTR/Quantmod/dPlR) with USDEUR'/><category term='Arc Diagram and spatiotemporal data mining visualization'/><category term='and all that)'/><category term='Using J48 Decision Tree Classifier to Dynamically'/><category term='Time Series Calendar Heat Maps Using R'/><category term='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?'/><category term='2011 &quot;plunge&quot; headlines are in the air tonight'/><category term='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2'/><category term='Quantitative Candlestick Pattern Recognition (Part 2 -- What&apos;s this Natural Language Processing'/><category term='Practical Implementation of Neural Network based Time Series (Stock) Prediction - PART 3'/><category term='Chaos in the Financial Markets?'/><category term='The Kalman Filter For Financial Time Series'/><category term='Conditioning Systems on Regime Variables'/><category term='Artificial Immune Systems and Financial Applications?'/><category term='Simulating Win/Loss streaks with R rle function'/><category term='Aug 4'/><category term='Quantmod'/><category term='Spatio-Temporal Data Mining: 2'/><category term='TTR'/><category term='Learning with Case Studies&quot;'/><category term='Pattern  Recognition: forward Boxplot Trajectories using R'/><category term='Genetic Algorithm Systematic Trading Development -- Part 3  (Python/VBA)'/><category term='Practical Implementation of Neural Network based t'/><title type='text'>Intelligent Trading</title><subtitle type='html'>Discovering edge using Machine Learning, Data Mining, and Bio Inspired Algorithms to augment traditional Systematic Development.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>35</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8916711774225930056</id><published>2012-01-31T23:21:00.000-08:00</published><updated>2012-01-31T23:53:50.056-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MINE: Maximal Information-based NonParametric Exploration'/><title type='text'>MINE: Maximal Information-based NonParametric Exploration</title><content type='html'>&lt;br /&gt;There was a lot of buzz in the blogosphere as well as the science community about a new family of algorithms that are able to find non-linear relationships over extremely large fields of data. What makes it particularly useful is that the measure(s) it uses are based upon mutual information rather than standard pearson's correlation type measures, which do not capture non-linear relationships well. &lt;br /&gt;&lt;br /&gt;The (java based) software can be downloaded here: http://www.exploredata.net/  In addition, there is the capability to directly run the software from R.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-_twTNhFO-Ko/TyjqVlh-aFI/AAAAAAAAAU0/vWh1JKRoOMs/s1600/shot3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://2.bp.blogspot.com/-_twTNhFO-Ko/TyjqVlh-aFI/AAAAAAAAAU0/vWh1JKRoOMs/s400/shot3.png" width="396" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&amp;nbsp;Fig 1. Typical non-linear relationship exemplified by intermarket relationships.&lt;/div&gt;&lt;br /&gt;The algorithm seems promising as it would allow us to possibly mine very large data sets (such as financial intermarket relationships) and find potentially meaningful non-linear relationships. If we were to use the typical pearson's correlation measures, such relationships would show very small R^2 values, and thus be discarded as non significant relationships.&lt;br /&gt;&lt;br /&gt;I decided to take it for a spin on an example of a non-linear example, taken from M. Katsanos' book on intermarket trading strategies (p 25. fig 2.3).  In figure 1, we can clearly see that the relationship between markets is non-linear, and thus the traditional linear fit returns a low R^2 value of .143 (red line), a loess fit is also shown in blue.  After running the same data through MINE, the results returned in a .csv file, were... &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;MIC&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (strength)&amp;nbsp;&amp;nbsp;&amp;nbsp; MIC-p^2&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (nonlinearity)&lt;br /&gt;0.16691002&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.62445&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.129283&amp;nbsp;&amp;nbsp;&amp;nbsp; -0.3777441&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The MIC (Mutual Information Coefficient) of .167 was not much greater than theR^2 measure of .143 above. However, one of the mentions in the paper was that as the signal becomes more obscured by noise, the MIC will degrade comparably.  &lt;br /&gt;&lt;br /&gt;The next step would be too find some type of fit to minimize the noise component and make updated comparisons.&lt;br /&gt;&lt;br /&gt;In order to show a better illustration of how useful it might be, I am attaching a screenshot of the reference material here.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-dV7DmXqXn-U/Tyjm-MUe5cI/AAAAAAAAAUs/YL2pCDV5kCg/s1600/paper_Ex.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-dV7DmXqXn-U/Tyjm-MUe5cI/AAAAAAAAAUs/YL2pCDV5kCg/s400/paper_Ex.png" width="342" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Figure 2. Reproduced from Fig 6. 'www.sciencemag.org/cgi/content/full/334/6062/1518/DC1'&lt;br /&gt;&lt;br /&gt;Notice the MIC Score measure outperforms other traditional methods on many non-linear structural relationships.&lt;br /&gt;&lt;br /&gt;Here is the full R-Code to repeat the basic experiment.&lt;br /&gt;###############################################&lt;br /&gt;# MINE example from intelligenttradingtech.blogspot.com 1/31/2012&lt;br /&gt;&lt;br /&gt;library(quantmod)&lt;br /&gt;library(ggplot2)&lt;br /&gt;&lt;br /&gt;getSymbols('^GSPC',src='yahoo',from='1992-01-07',to='2007-12-31')&lt;br /&gt;getSymbols('^N225',src='yahoo',from='1992-01-07',to='2007-12-31')&lt;br /&gt;&lt;br /&gt;sym_frame&amp;lt;-merge(GSPC[,6],N225[,6],all=FALSE)&lt;br /&gt;names(sym_frame)&amp;lt;-c('GSPC','N225')&lt;br /&gt;&lt;br /&gt;p&amp;lt;-qplot(N225, GSPC, data=data.frame(coredata(sym_frame)),&lt;br /&gt;geom=c('point'), xlab='NIKKEI',ylab='S&amp;amp;P_500',main='S&amp;amp;P500 vs NIKKEI 1992-2007')&lt;br /&gt;&lt;br /&gt;fit&amp;lt;-lm(GSPC~ N225, data=data.frame(coredata(sym_frame)))&lt;br /&gt;summary(fit)&lt;br /&gt;fitParam&amp;lt;-coef(fit)&lt;br /&gt;&lt;br /&gt;p+geom_abline(intercept=fitParam[1], slope=fitParam[2],colour='red',size=2)+geom_smooth(method='loess',size=2,colour='blue')&lt;br /&gt;&lt;br /&gt;### MINE results&lt;br /&gt;library("rJava")&lt;br /&gt;setwd('/home/self/Desktop/MINE/')&lt;br /&gt;&lt;br /&gt;write.csv(data.frame(coredata(sym_frame)),file="GSPC_N225.csv",row.names=FALSE)&lt;br /&gt;source("MINE.r")&lt;br /&gt;MINE("GSPC_N225.csv","all.pairs")&lt;br /&gt;&lt;br /&gt;##########################################################&lt;br /&gt;&lt;br /&gt;The referenced paper is, "Detecting Novel Associations in Large Data Sets"&lt;br /&gt;David N. Reshef, et al.&lt;br /&gt;Science 334, 1518 (2011)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As an aside, I've been hooked on a sitcom series called, "Numb3rs," playing on Amazon Prime. It's about an FBI agent who gets assistance from his genius brother, a professor of Mathematics at a prestigious University. So far, they've discussed markov chains, bayesian statistics, data mining, econometrics, heat maps, and a host of other similar concepts applied to forensics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8916711774225930056?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8916711774225930056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2012/01/there-was-lot-of-buzz-in-blogosphere-as.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8916711774225930056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8916711774225930056'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2012/01/there-was-lot-of-buzz-in-blogosphere-as.html' title='MINE: Maximal Information-based NonParametric Exploration'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-_twTNhFO-Ko/TyjqVlh-aFI/AAAAAAAAAU0/vWh1JKRoOMs/s72-c/shot3.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-4177052682764556975</id><published>2012-01-01T01:33:00.000-08:00</published><updated>2012-01-01T01:40:57.264-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Free Online Stanford Machine Learning Course: Andrew Ng. Post Mortem.'/><title type='text'>Free Online Stanford Machine Learning Course: Andrew Ng. Post Mortem.</title><content type='html'>Happy New Year to all the viewers of this blog and just a short reminder that the course will be available again this January.&lt;br /&gt;http://www.ml-class.org/course/auth/welcome&lt;br /&gt;&lt;br /&gt;Having audited the course, I would highly recommend it to anyone who is interested in a very hands on learning session covering many of the topics I've posted about (and many other areas, such as how to deal with over/under fitting). Kudos to Dr. Ng for a fantastic, engaging, and informative course.&lt;br /&gt;&lt;br /&gt;As an added incentive, users will become familiarized with many vectorized approaches to programming (via Octave), which are very useful in languages such as Python and R.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-4177052682764556975?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/4177052682764556975/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2012/01/free-auditing-of-stanford-ai-and.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/4177052682764556975'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/4177052682764556975'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2012/01/free-auditing-of-stanford-ai-and.html' title='Free Online Stanford Machine Learning Course: Andrew Ng. Post Mortem.'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8436296409953802808</id><published>2011-10-14T18:02:00.000-07:00</published><updated>2011-10-14T18:17:43.591-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig'/><title type='text'>Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig</title><content type='html'>Just wanted to notify viewers of a few great courses that are being offered free for auditing and/or participation by well known industry experts, including co-author of the classic text on AI, 'Artificial Intelligence: A Modern Approach,' Peter Norvig and Prof. Andrew Ng.&lt;br /&gt;&lt;br /&gt;http://www.ai-class.com/&lt;br /&gt;see also,&lt;br /&gt;http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2011/10/14/BUFR1LH9JR.DTL&lt;br /&gt;&lt;br /&gt;The notice is a bit late, but they are still accepting registrations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8436296409953802808?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8436296409953802808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/10/free-auditing-of-stanford-ai-and.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8436296409953802808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8436296409953802808'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/10/free-auditing-of-stanford-ai-and.html' title='Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-5352608268047431520</id><published>2011-10-06T12:12:00.000-07:00</published><updated>2011-10-06T22:42:30.673-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Spatio-Temporal Data Mining: 2'/><title type='text'>Spatio-Temporal Data Mining: 2</title><content type='html'>There are many visual methods used to identify patterns in space and time.  I've discussed some in prior threads and will show a few others briefly here.  One of the most difficult questions I often hear from others regarding markov type approaches, is how to identify states to be processed.&lt;br /&gt;&lt;br /&gt;It is a similar problem that one encounters using simple linear type factor analysis. Unfortunately, there is no simple answer; however, because data streams are becoming so vast it becomes almost impossible to enumerate over all possible state sets. Visual mining techniques can be incredibly helpful in narrowing down that space as well as feature reduction.&amp;nbsp; I often use these types of visualizations back and forth with unsupervised classification type learners to converge on useful state identifications.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/--FsPfrUfg3M/To33ZNFdMKI/AAAAAAAAASk/yl4AjAGtU5I/s1600/spctmp1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="363" src="http://1.bp.blogspot.com/--FsPfrUfg3M/To33ZNFdMKI/AAAAAAAAASk/yl4AjAGtU5I/s640/spctmp1.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fig 1. Spatio-Temporal State plot&lt;br /&gt;&lt;br /&gt;Figure 1 gives an idea on visualizing states with respect to time.  But having such knowledge in isolation doesn't give us much use. We are more interested in looking for Bayesian type relationships between states that give some probabilities between linked states in time.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-9J6B_6gCevY/To34DkJU78I/AAAAAAAAASs/FhWTvFCBxjs/s1600/fluc1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://2.bp.blogspot.com/-9J6B_6gCevY/To34DkJU78I/AAAAAAAAASs/FhWTvFCBxjs/s400/fluc1.png" width="382" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fig 2. Fluctuation Plot&lt;br /&gt;&lt;br /&gt;Several visual methods exist to capture the relationships visually. One common plot used in language processing and information theory, is a fluctuation plot. The above plot was built using the same state data as the first graph. It is often used to determine conditional relationships between symbols such as alphabet tokens. The size of each box is directly proportional to the weight of the transition probabilities between row and column states in tabular data. An example would be to think of the letters yzy more commonly followed by g (as in syzygy) than any other state token; thus, one would expect to quickly spot a larger box across a row of states representing the 'yzy' row token n-gram and 'g' column token .&lt;br /&gt;&lt;br /&gt;Both plots were produced in R.&amp;nbsp; ggflucuation() is a plot command utilized from ggplot2.&amp;nbsp; I am currently investigating how much easier and faster it might be to process such visualizations in tools like protovis and processing.&amp;nbsp; I've been especially inspired by reading some of Nathan Yau's excellent visualization work in his book, 'Visualize This.' I included it in the link to the right for interested readers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-5352608268047431520?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/5352608268047431520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/10/spatiotemporal-data-mining-2.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5352608268047431520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5352608268047431520'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/10/spatiotemporal-data-mining-2.html' title='Spatio-Temporal Data Mining: 2'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/--FsPfrUfg3M/To33ZNFdMKI/AAAAAAAAASk/yl4AjAGtU5I/s72-c/spctmp1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-675748711275860256</id><published>2011-09-23T17:49:00.000-07:00</published><updated>2011-09-26T19:39:27.137-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Arc Diagram and spatiotemporal data mining visualization'/><title type='text'>Arc Diagram and spatiotemporal data mining visualization</title><content type='html'>I won't spend too much time discussing this fascinating topic other than to say it relates very much to prior discussions about pattern discovery via visual data mining (see lexical dispersion plots for example).&amp;nbsp; I happened across an interesting visualization method called the Arc Diagram, developed by Martin Wattenberg. Working for data visualization groups at IBM and later Google, he developed some interesting visual representations of spatiotemporal data. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-xgIl6E9wgv8/Tn0VnOB5B3I/AAAAAAAAASc/t5_BFlzh-O0/s1600/arcd1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://1.bp.blogspot.com/-xgIl6E9wgv8/Tn0VnOB5B3I/AAAAAAAAASc/t5_BFlzh-O0/s640/arcd1.jpg" width="378" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Fig 1. Arc Diagram and legend with example of discretized pattern archetype. &lt;br /&gt;&lt;br /&gt;The resulting plot generates some fascinating temporal signatures,  similar to what one might see in&amp;nbsp; phase-space portraits from chaos.  However, they have been frequently utilized to look for spatiotemporal  signatures in music.&amp;nbsp; One might discern a type of underlying order or  visual signature of complexity as well as recurring patterns in sequential objects ranging from text  based lyrical information to musical sheet notes.&lt;br /&gt;&lt;br /&gt;&amp;nbsp;Figure 1 shows an example of how one might utilize this tool towards temporal pattern discovery in time series. A weekly series from SPY has been discretized into alphabet tokens, based upon the bin ranges in the included legend. The small chart in the example would decode an archetypal pattern for the following sequence: ECDCECCD, into a time series representation of the 8 week data symbol. The following interactive java tool from another blogger, Neoformix, was then used to translate the data into an Arc Diagram.&amp;nbsp; http://www.neoformix.com/Projects/DocumentArcDiagrams/index.html&amp;nbsp; .&amp;nbsp; Read from top to bottom, one can look at recurring and related patterns that are repeated over time; certain behavior might warrant further investigation.&lt;br /&gt;&lt;br /&gt;You can copy the following data stream into the tool to toy around with the tool to get a feel for the possibilities of visual pattern discovery.*&amp;nbsp; I won't go into too much more detail about utilizing it, other than to say it appears to be a very useful tool in temporal based pattern discovery.&lt;br /&gt;&lt;br /&gt;Please see the following for more ideas on arc diagrams and musical signatures:&lt;br /&gt;http://www.research.ibm.com/visual/papers/arc-diagrams.pdf &lt;br /&gt;&lt;br /&gt;http://turbulence.org/Works/song/mono.html&lt;br /&gt;&lt;br /&gt;Blog mentioned:&lt;br /&gt;http://www.neoformix.com/&lt;br /&gt;&lt;br /&gt;* Not sure how to attach .xls file here, but if anyone wants a copy of the .xls file, you can send me an email and I'll try to get it out to you.&amp;nbsp; Otherwise, you can simply grab a song lyric off the web to play with the tool.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-675748711275860256?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/675748711275860256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/09/arc-diagram-and-spatiotemporal-data.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/675748711275860256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/675748711275860256'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/09/arc-diagram-and-spatiotemporal-data.html' title='Arc Diagram and spatiotemporal data mining visualization'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-xgIl6E9wgv8/Tn0VnOB5B3I/AAAAAAAAASc/t5_BFlzh-O0/s72-c/arcd1.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3360151711565606498</id><published>2011-08-04T15:44:00.000-07:00</published><updated>2011-08-04T15:53:39.936-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Aug 4'/><category scheme='http://www.blogger.com/atom/ns#' term='2011 &quot;plunge&quot; headlines are in the air tonight'/><title type='text'>Aug 4, 2011 "plunge" headlines are in the air tonight</title><content type='html'>Today's financial headlines are littered with the word 'plunge.'&amp;nbsp; Considering today's (cl-cl) drop on the S&amp;amp;P500 was just about -5%, I don't know that I would exactly call that a plunge.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-cAGZ6fj05gw/TjsfOuQf8II/AAAAAAAAASI/Wx34eYk5ZTs/s1600/plungeblog.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="260" src="http://4.bp.blogspot.com/-cAGZ6fj05gw/TjsfOuQf8II/AAAAAAAAASI/Wx34eYk5ZTs/s400/plungeblog.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fig 1. Historical ts plot of S&amp;amp;P500 returns &amp;lt;= -5%&lt;br /&gt;&lt;br /&gt;The following R code produced a time series plot of historical occasions where this occurred.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;/code&gt;&lt;code class="bash plain"&gt;###################################################&lt;/code&gt;&lt;code class="bash plain"&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;library(quantmod)&lt;br /&gt;&lt;br /&gt;getSymbols("^GSPC",from="1950-01-01",to="2012-01-01")&lt;br /&gt;adj&amp;lt;-GSPC$GSPC.Adjusted&lt;br /&gt;rtn&amp;lt;-(adj/lag(adj,1)-1)[2:length(adj)]&lt;br /&gt;r05&amp;lt;-rtn[rtn&amp;lt;= -.05]&lt;br /&gt;&lt;br /&gt;plot(sort(r05),type='o',main='S&amp;amp;P500 1950-present returns &amp;lt;= -5%')&lt;br /&gt;&lt;code class="bash plain"&gt;&lt;/code&gt;&lt;br /&gt;&lt;code class="bash plain"&gt;###################################################&lt;/code&gt;&lt;code class="bash plain"&gt;&lt;/code&gt;&lt;br /&gt;Although the frequency of such occurrences is&amp;nbsp; arguably rare, the 1987 drop is much more worthy of the 1 day label 'plunge.'&lt;br /&gt;&lt;br /&gt;One other disturbing observation in the data, however, is the large temporal clustering of occurrences in the recent 2008 region.&amp;nbsp; Now that's behavior to be concerned about (not to mention revised flash crash data pts.).&lt;br /&gt;&lt;br /&gt;filtered 1 day cl-cl returns &amp;lt;=-5% sorted by date&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Cvz8Ywx-xV8/Tjsi3pn-5SI/AAAAAAAAASM/S6ylMDk298w/s1600/rtns.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://3.bp.blogspot.com/-Cvz8Ywx-xV8/Tjsi3pn-5SI/AAAAAAAAASM/S6ylMDk298w/s400/rtns.png" width="210" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3360151711565606498?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3360151711565606498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/08/aug-4-2011-plunge-headlines-are-in-air.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3360151711565606498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3360151711565606498'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/08/aug-4-2011-plunge-headlines-are-in-air.html' title='Aug 4, 2011 &quot;plunge&quot; headlines are in the air tonight'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-cAGZ6fj05gw/TjsfOuQf8II/AAAAAAAAASI/Wx34eYk5ZTs/s72-c/plungeblog.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-1604836881454541412</id><published>2011-07-28T14:54:00.000-07:00</published><updated>2011-07-28T18:25:09.512-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Pattern  Recognition: forward Boxplot Trajectories using R'/><title type='text'>Pattern  Recognition: forward Boxplot Trajectories using R</title><content type='html'>Although the following discussion can apply to the Quantitative Candlestick Pattern Recognition series, it is addressing the same issue as any basic conditional type system -- how and when to exit.&amp;nbsp; The following is one way to visualize and think about it, and is by no means optimal.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-bqEOyXAJwAs/TjILK4Ef1lI/AAAAAAAAASA/tw0eGjADDiI/s1600/traj4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/-bqEOyXAJwAs/TjILK4Ef1lI/AAAAAAAAASA/tw0eGjADDiI/s320/traj4.png" width="319" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fig 1. Posterior Boxplot Trajectory&lt;br /&gt;&lt;br /&gt;Often we attempt to find some set of prior input patterns that leads to profitable posterior outcomes.&amp;nbsp; However, in most of the available examples, we are typically only given heuristics and rules of thumb on where to exit.&amp;nbsp; This might make sense, since no one can accurately predict where to exit. However, with knowledge of past samples, we can have some idea of where a good target to exit might be, given the prior knowledge of forward trajectories.&amp;nbsp; I dubbed the name 'boxplot trajectory', here, as I think it's a useful way to visualize a group of many possible outcome trajectories for further analysis.&lt;br /&gt;&lt;br /&gt;In this example, a set of daily price based patterns was analyzed via a proprietary program I wrote in R, which resulted in an input pattern yielding a set of 52 samples that met my conditional criteria.&amp;nbsp; Fig 1 illustrates a way to look at the trajectory outcomes based upon one of the profitable patterns in the conditional criteria. The bottom graph is simply the plot of median results of each data point in the trajectory. We often try to imagine the best way to exit without foreknowledge of the future (and somewhat less rule of thumb based criteria).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-gE67oXWDIF8/TjHGL965KmI/AAAAAAAAARk/oICjq4Hq80w/s1600/traj3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="262" src="http://2.bp.blogspot.com/-gE67oXWDIF8/TjHGL965KmI/AAAAAAAAARk/oICjq4Hq80w/s320/traj3.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fig 2. Trajectory tree.&lt;br /&gt;&lt;br /&gt;One approach would be to use some type of exiting rule based upon the statistical median of each sequential point's range.&amp;nbsp; Knowing that 1/2 of the vertices occur above and 1/2 below the median, we should expect to hit at least 1/2 of the targets at or above the median. Given that the 3rd point is the highest median, it makes sense to exit earlier than waiting for a greater gain further out (which has an even lower median).&amp;nbsp; So if we take as a target, the median value of the 3rd pt. we achieve an average and fixed target of 1.59% on 27/52 of the total samples.&lt;br /&gt;&lt;br /&gt;Of the remaining samples, we may now wish to exit on the 11th bar (or earlier if the same target is hit earlier) target of .556%, which is achieved on 13/52 of the remaining samples.&amp;nbsp; This leaves only the last bar of which we simply use the average return as the weighted return value for that target, in this case -1.74% for the remaining samples : 12/52. Notice we will always have the worse contenders that were put off until the end.&lt;br /&gt;&lt;br /&gt;The expectation yields E(rtn)=27/52*.0159+13/52*.0056-12/52*-.017 =.0057&lt;br /&gt;eeking out a small average + gain of .57%. Compounded, this gives:&lt;br /&gt;(1+.0159)^27*(1+.0056)^13*(1-.017)^12~ 34% rtn for 52 trades, each less than 3 days in length.&amp;nbsp; Hit rate (as secondary observation) is 77% in this case.&lt;br /&gt;&lt;br /&gt;The approach is particularly appealing for a high frequency strategy with very low commissions. Notice it's by no means comprehensive (and yes, I've only shown in sample here), but rather a novel way to think about exiting strategies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-1604836881454541412?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/1604836881454541412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/07/pattern-recognition-forward-boxplot.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1604836881454541412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1604836881454541412'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/07/pattern-recognition-forward-boxplot.html' title='Pattern  Recognition: forward Boxplot Trajectories using R'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-bqEOyXAJwAs/TjILK4Ef1lI/AAAAAAAAASA/tw0eGjADDiI/s72-c/traj4.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-7842844415337977845</id><published>2011-05-17T15:34:00.000-07:00</published><updated>2011-05-23T13:05:59.212-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Simulating Win/Loss streaks with R rle function'/><title type='text'>Simulating Win/Loss streaks with R rle function</title><content type='html'>The following script allows you to simulate sample runs of Win, Loss, Breakeven streaks based on a random distribution, using the run length encoding function, rle in R. Associated probabilities are entered as a vector argument in the sample function.&lt;br /&gt;&lt;br /&gt;You can view the actual sequence of trials (and consequent streaks) by looking at the trades result.&amp;nbsp; maxrun returns a vector of maximum number of Win, Loss, Breakeven streaks for each sample run. And lastly, the prop table gives a table of proportion of run transition pairs from losing streak of length n to streak of all alternate lengths.&lt;br /&gt;&lt;br /&gt;Example output (max run length of losses was 8 here):&lt;br /&gt;&lt;br /&gt;100*prop.table(tt)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; lt.2&lt;br /&gt;lt.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1 41.758 14.298&amp;nbsp; 5.334&amp;nbsp; 1.662&amp;nbsp; 0.875&amp;nbsp; 0.131&amp;nbsp; 0.000&amp;nbsp; 0.044&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2 14.692&amp;nbsp; 4.897&amp;nbsp; 1.924&amp;nbsp; 0.787&amp;nbsp; 0.394&amp;nbsp; 0.087&amp;nbsp; 0.131&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3&amp;nbsp; 4.985&amp;nbsp; 2.405&amp;nbsp; 0.525&amp;nbsp; 0.350&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.044&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 4&amp;nbsp; 1.662&amp;nbsp; 0.875&amp;nbsp; 0.306&amp;nbsp; 0.087&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 5&amp;nbsp; 0.831&amp;nbsp; 0.219&amp;nbsp; 0.175&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.044&amp;nbsp; 0.000&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 6&amp;nbsp; 0.087&amp;nbsp; 0.131&amp;nbsp; 0.044&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 7&amp;nbsp; 0.087&amp;nbsp; 0.087&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&lt;br /&gt;&amp;nbsp;&amp;nbsp; 8&amp;nbsp; 0.044&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&amp;nbsp; 0.000&lt;br /&gt;&lt;br /&gt;maxrun&lt;br /&gt;&amp;nbsp;B&amp;nbsp; L&amp;nbsp; W &lt;br /&gt;&amp;nbsp;3&amp;nbsp; 8 17 &lt;br /&gt;&lt;br /&gt;-----------------------------------------------------------------------------------------&lt;br /&gt;#generate simulations of win/loss streaks use rle function&lt;br /&gt;&lt;br /&gt;trades&amp;lt;-sample(c("W","L","B"),10000,prob=c('.6','.35','.05'),replace=TRUE)&lt;br /&gt;traderuns&amp;lt;-rle(trades)&lt;br /&gt;tr.val&amp;lt;-traderuns$values&lt;br /&gt;tr.len&amp;lt;-traderuns$lengths&lt;br /&gt;maxrun&amp;lt;-tapply(tr.len,tr.val,max)&lt;br /&gt;&lt;br /&gt;#streaks of losing trades&lt;br /&gt;lt&amp;lt;-tr.len[which(tr.val=='L')]&lt;br /&gt;lt.1&amp;lt;-lt[1:(length(lt)-1)]&lt;br /&gt;lt.2&amp;lt;-lt[2:(length(lt))]&lt;br /&gt;&lt;br /&gt;#simple table of losing trade run streak(n) frequencies&lt;br /&gt;table(lt)&lt;br /&gt;&lt;br /&gt;#generate joint ensemble table streak(n) vs streak(n+1)&lt;br /&gt;tt&amp;lt;-table(lt.1,lt.2)&lt;br /&gt;#convert to proportions&lt;br /&gt;options(digits=2)&lt;br /&gt;100*prop.table(tt)&lt;br /&gt;maxrun&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-7842844415337977845?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/7842844415337977845/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/05/simulating-winloss-streaks-with-r-rle.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/7842844415337977845'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/7842844415337977845'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/05/simulating-winloss-streaks-with-r-rle.html' title='Simulating Win/Loss streaks with R rle function'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8825519421484423678</id><published>2011-05-10T19:42:00.000-07:00</published><updated>2011-05-13T12:37:11.626-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='High Low Clustering on intraday high frequency sampled data'/><title type='text'>High Low Clustering on intraday high frequency sampled data</title><content type='html'>Nothing unusually exciting on this post, but I happened to be engaged in some particle based methods recently and made some simple visual observations as I was setting up some of the sampling environment in R.&amp;nbsp; I am also using Rkward and Ubuntu to generate, so I'm gathering everything from the current environment (including graphics).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-1biitg2dFjE/Tc2HrRtQghI/AAAAAAAAARQ/MQPIgYOYpHA/s1600/hlstudy1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="311" src="http://2.bp.blogspot.com/-1biitg2dFjE/Tc2HrRtQghI/AAAAAAAAARQ/MQPIgYOYpHA/s320/hlstudy1.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-1sS7bk3KWZI/Tcn1AXpVFOI/AAAAAAAAARI/RE5fBGuQ8wo/s1600/aaplmaxmin.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Fig 1. Parallel plot of half hr sample of High and Low intraday data points vs time (Max is purple dots, Min are red). Fig 2. Cumulative count of high low events per interval (blue = total high and low).&lt;br /&gt;&lt;br /&gt;The plot illustrates sampled intraday data at half hour increments.&lt;br /&gt;The highs and lows of each sample interval are overlaid using purple to denote an intraday high and red to denote an intraday low. &lt;br /&gt;Interesting points of observation are--&lt;br /&gt;&lt;br /&gt;1) The high and low samples tend to be clustered at open, midday, and close.&lt;br /&gt;2) High and low events do not appear to be uniformly and randomly distributed over time. &lt;br /&gt;This kind of data processing is useful towards generating, exploring, and evaluating pattern based setups. &lt;br /&gt;&lt;br /&gt;The study is by no means complete or conclusive, just stopping by to show more of the type of data processing and visual capabilities that R is capable of.&amp;nbsp;&amp;nbsp; If anyone has done any more conclusive studies I'd be interested to hear.&lt;br /&gt;&lt;br /&gt;P.S. If anyone notices any odd changes, for some reason Google was having some issues the last few days, and it appears to have reverted to my original (not ready to launch) draft.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8825519421484423678?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8825519421484423678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/05/high-low-clustering-on-intraday-sampled.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8825519421484423678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8825519421484423678'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/05/high-low-clustering-on-intraday-sampled.html' title='High Low Clustering on intraday high frequency sampled data'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-1biitg2dFjE/Tc2HrRtQghI/AAAAAAAAARQ/MQPIgYOYpHA/s72-c/hlstudy1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3766421220893456184</id><published>2011-03-08T15:12:00.000-08:00</published><updated>2011-03-10T01:47:58.253-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Can one beat a Random Walk-- IMPOSSIBLE (you say?)'/><title type='text'>Can one beat a Random Walk-- IMPOSSIBLE (you say?)</title><content type='html'>Firstly, apologies for the long absence as I've been busy with a few things.&amp;nbsp; Secondly, apologies for the horrific use of caps in the title (for the grammar monitors).&amp;nbsp; Certainly, you'll gain something useful from today's musing, as it's a pretty profound insight for most (was for me at the time). I've also considered carefully, whether or not to divulge this concept, but considering it's often overlooked and in the public literature (I'll even share a source), I decided to discuss it.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-Mkk0T0mo9RQ/TXidwtzvYZI/AAAAAAAAARE/zSZLTfNIffY/s1600/rw.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="310" src="https://lh6.googleusercontent.com/-Mkk0T0mo9RQ/TXidwtzvYZI/AAAAAAAAARE/zSZLTfNIffY/s320/rw.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-MS01EvZqBkE/TXa0KPeMZCI/AAAAAAAAARA/yTocKd21dsY/s1600/rwit.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Fig 1. Random Walk and the 75% rule&lt;br /&gt;&lt;br /&gt;I've seen the same debate launched over and over on various chat boards, which concerns the impossibility of theoretically beating a random walk.&amp;nbsp; In this case, I am giving you the code to determine the answer yourself.&lt;br /&gt;The requirements: 1) the generated data must be from an IID gaussian distribution 2) series must be coaxed to a stationary form.&lt;br /&gt;&lt;br /&gt;The following script will generate a random series of data and follow the so called 75% rule which says,&lt;br /&gt;Pr[Price&amp;gt;Price(n-1) &amp;amp; Price(n-1) &amp;lt; Price_median] Or [Price &amp;lt; Price(n-1) &amp;amp; Price(n-1) &amp;gt; Price_median] = 75%.&amp;nbsp; This very insightful rule (which is explained both mathematically and in layman's terms in the book 'Statistical Arbitrage' linked on the amazon box to the right), shows that given some stationary, IID, random sequence that has an underlying Gaussian distribution, the above rule set can be shown to converge to a correct prediction rate of 75%!&lt;br /&gt;&lt;br /&gt;Now, we all know that market data is not Gaussian (nor is it commision/slippage/friction free), and therein lies the rub. But hopefully, it gives you some food for thought as well as a bit of knowledge to retort, when you hear the debates about impossibilities of beating a random walk. &lt;br /&gt;&lt;br /&gt;R Code is below. &lt;br /&gt;&lt;br /&gt;##################################################&lt;br /&gt;#gen rnd seq for 75% RULE&lt;br /&gt;&lt;br /&gt;#generate stationary rw time series&lt;br /&gt;rw&amp;lt;-rnorm(100)&lt;br /&gt;&lt;br /&gt;m&amp;lt;-median(rw)&lt;br /&gt;trade&amp;lt;-rep(0,length(rw))&lt;br /&gt;&lt;br /&gt;for(i in 1:(length(rw)-1)){&lt;br /&gt;if(rw[i] &amp;lt; m) trade[i]&amp;lt;- (rw[i+1]-rw[i])&lt;br /&gt;if(rw[i] &amp;gt; m) trade[i]&amp;lt;- (rw[i]-rw[i+1])&lt;br /&gt;if(rw[i] == m) trade[i]&amp;lt;- 0}&lt;br /&gt;&lt;br /&gt;eq_curve&amp;lt;-cumsum(trade)&lt;br /&gt;&lt;br /&gt;par(mfrow=c(2,1))&lt;br /&gt;plot(rw,type='l',main='random walk')&lt;br /&gt;plot(eq_curve,type='l',main='eq_curve')&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3766421220893456184?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3766421220893456184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/03/can-one-beat-random-walk-impossible-you.html#comment-form' title='44 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3766421220893456184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3766421220893456184'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2011/03/can-one-beat-random-walk-impossible-you.html' title='Can one beat a Random Walk-- IMPOSSIBLE (you say?)'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh6.googleusercontent.com/-Mkk0T0mo9RQ/TXidwtzvYZI/AAAAAAAAARE/zSZLTfNIffY/s72-c/rw.jpg' height='72' width='72'/><thr:total>44</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-5600003462654699569</id><published>2010-11-19T18:55:00.000-08:00</published><updated>2010-11-19T18:55:48.507-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='A practical R book on Data Mining:  &quot;Data Mining With R'/><category scheme='http://www.blogger.com/atom/ns#' term='Learning with Case Studies&quot;'/><category scheme='http://www.blogger.com/atom/ns#' term='Finally'/><title type='text'>Finally! A practical R book on Data Mining:  "Data Mining With R, Learning with Case Studies," by Luis Torgo</title><content type='html'>I've been a bit busy lately with a few big things, however, I wanted to stop by and mention a fantastic book for those who have been following along the R examples.&amp;nbsp; Anyone who's followed my blog knows that I'm big on practical books with examples.&amp;nbsp; There are also three main open source tools I've discussed with regards to prototyping trading systems: Weka, Python, and R.  Of the three tools mentioned, I've been able to recommend Witten and Frank's book on Data Mining for Weka, and Stephen Marsland's book on Machine Learning as the Python bible for hands on Machine Learning.&amp;nbsp; Well now, I can thankfully complete the trinity, with Luis Torgo's new book, 'Data Mining with R, Learning with Case Studies.'&lt;br /&gt;&lt;br /&gt;Both R novices and experts will find this a great reference for&amp;nbsp; Data Mining.&amp;nbsp; The opening chapter has a useful intro to get you started on R (Factors, Vectors, and Data Frames, as well as other useful objects are covered with examples).&amp;nbsp; Additional chapters cover both classification and regression type prediction schemes.&lt;br /&gt;&lt;br /&gt;The most useful chapter to readers here, however, is the chapter on 'Predicting Stock Market Returns.'&amp;nbsp; Many of the readers who have been looking for example scripts on some of the topics I've covered, will find them here. Not only is gathering and processing data (CSV,&amp;nbsp; quantmod and yahoo finance, and MySQL) well covered, but various prediction and evaluation schemes (cross validation, sliding and growing windows, PerformanceAnalytics package) are discussed along with access to the author's code.&amp;nbsp; Many topics I haven't discussed yet are available here as well, including MARS (Multivariate Adaptive Regression Splines), SVMs, and various validation techniques along with handy tabulation of results.&amp;nbsp; Having read a previous draft, I'm still working into the examples, and welcome any feedback and thoughts I can address.&lt;br /&gt;&lt;br /&gt;The book can be accessed via the amazon book showcase on the right and instructions for R code access are available in the book.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-5600003462654699569?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/5600003462654699569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/11/finally-practical-r-book-on-data-mining.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5600003462654699569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5600003462654699569'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/11/finally-practical-r-book-on-data-mining.html' title='Finally! A practical R book on Data Mining:  &quot;Data Mining With R, Learning with Case Studies,&quot; by Luis Torgo'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-24766559626079205</id><published>2010-08-10T13:22:00.000-07:00</published><updated>2010-08-10T14:17:16.551-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Conditioning Systems on Regime Variables'/><title type='text'>Conditioning Systems on Regime Variables</title><content type='html'>Here is a brief and simple example of switching systems based upon regime type (sometimes called gating). &lt;br /&gt;&lt;br /&gt;I've brought up the idea of conditioning systems based upon regimes many times in past posts.  Some texts call this filtering, although I prefer to use the term conditional gating.  The simple idea is to turn on a certain system during certain conditions and either: switching systems, or simply tracking the underlying series during alternate conditions.  In this case the gating condition is regime, which in turn is, is High or Low Volatility as measured by the VIX.&amp;nbsp; Although I'm not divulging the details of the underlying system itself, I've seen enough discussions in public domain to feel that other traders have picked up on the ideas demonstrated here.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: bothjavascript:void(0); text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFsTNAupaQI/AAAAAAAAAQk/Q8zzwzFsJaw/s1600/term_wealth.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="267" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFsTNAupaQI/AAAAAAAAAQk/Q8zzwzFsJaw/s400/term_wealth.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;Fig 1.  Terminal Wealth vs. VIX threshold&lt;br /&gt;&lt;br /&gt;The animation below shows the system results during each step of the conditioning variable, the VIX. Notice the dramatic improvement at the value of 23. Also, notice as mentioned in earlier posts how the optimal switching point of 23 is the most robust value, since even if the OOS results are to the left or right of the optimal switching point, they will be the best local values over a wide range of dependency. The astute observer might have noticed that this system is simply tracking buy&amp;amp;hold during low vix regimes, while switching on system V, during the high regimes. It is evident that the terminal wealth simply tracks buy &amp;amp; hold after a certain value of VIX, since it is always locked on to tracking mode under a certain threshold.&lt;br /&gt;&lt;br /&gt;The system is only shown in sample, however, I've found it to be pretty successful OOS as well.&lt;br /&gt;&lt;br /&gt;&lt;object height="344" width="425"&gt;&lt;param name="movie" value="http://www.youtube.com/v/ZcSyV0mMcA0&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/ZcSyV0mMcA0&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;Video 1. Stepping the Equity Curve system through linear VIX range.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-24766559626079205?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/24766559626079205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/08/conditioning-systems-on-regime.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/24766559626079205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/24766559626079205'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/08/conditioning-systems-on-regime.html' title='Conditioning Systems on Regime Variables'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFsTNAupaQI/AAAAAAAAAQk/Q8zzwzFsJaw/s72-c/term_wealth.jpg' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-1639189057449934263</id><published>2010-08-02T23:25:00.000-07:00</published><updated>2010-08-03T11:35:03.335-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Candlestick Pattern Recognition (Part 2 -- What&apos;s this Natural Language Processing'/><title type='text'>Quantitative Candlestick Pattern Recognition (Part 2 -- What's this Natural Language Processing stuff?)</title><content type='html'>&lt;b&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I wanted to briefly add one more thought regarding the temporal nature of probabilities as was alluded to in my correspondence with Adam, as well as the prior closing comments on the Chaos post (structure coalescing and dispersing).&lt;br /&gt;&lt;br /&gt;I will borrow from the field of Natural Language Processing and introduce one common visual description of how the states evolve over time using something called a Lexical Dispersion Plot.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFc5yBldJLI/AAAAAAAAAQc/wFl9hwC80VY/s1600/dsip_plot.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="176" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFc5yBldJLI/AAAAAAAAAQc/wFl9hwC80VY/s640/dsip_plot.jpg" width="640" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Lexical (cluster state vocabulary) Dispersion Plot of Clustered Candlestick States over time&lt;br /&gt;&lt;br /&gt;In studies of language, we are often interested in observing how statistical patterns and relationships of sounds, characters, and words, evolve over time.&amp;nbsp; Natural Language Processing is an entire field that has been dedicated to finding proper tools and vernacular to describe such statistics.&amp;nbsp; The idea of using a lexical dispersion plot, is to observe how the lexicon itself evolves over time.&amp;nbsp; To give a simple example, we might take a corpus of common pop culture texts borrowed from some library, and look at the occurrence of the following three word states; "spider", "man", and "spider man".&amp;nbsp; The first two terms are isolated words, and the third term is called a bigram, which is a joint occurrence of two states in sequential order.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;Now, although I haven't created the proposed lexical dispersion plot for the above scenario, one could reasonably expect for the number of occurrences of the single words, spider and man, to be relatively frequent and uniform from about 1900 to say 1960, while the joint pair (spider,man), might occur relatively sparsely. However, beyond the 60s, we would notice an increase in the joint pair (spider,man) as the popularity of the fictional character began to grow in popularity in the collective pop consciousness.&amp;nbsp; We might also expect a large frequency of the bigram to occur with recent popularity in the films. However, it's possible that a few hundred years later, that the joint term and character popularity might just wane and eventually die off, even though the two unimodal terms (spider and man) are still frequently observed.&lt;br /&gt;&lt;br /&gt;Ok, so what's the point to this? Well, we are commonly taught in statistics that there is a population that exists to describe the ultimate best statistical model of any observational set that lies somewhat beyond the notion of time (much like Plato's ideas of forms existing behind the scenes to describe all nature over all time, for philosophy fans).&lt;br /&gt;&lt;br /&gt;But one of the things that disturbed me earlier on is exactly what I described in the prior paragraph on the joint bigram of spider man, which is that sometimes we have to pragmatically shed some of our beliefs about 'ideal' populations and just try to observe statistical phenomena as it occurs temporally.&amp;nbsp; As mentioned in the Chaos quote, some patterns just spontaneously occur (spider man) for a while, then disappear over time. So, that the notion of a larger population existing behind the scenes (and all the statistical rigor associated with it), might be either overkill or even misleading towards our goal of trying to capture the essence of fleeting patterns. From a statistical viewpoint, I suppose I would lean more on the side of the Bayesian inference camp (constantly updating beliefs online, rather than the frequentist approach).&lt;br /&gt;&lt;br /&gt;It's common knowledge in markets that financial time series are not IID (independent and independently distributed) over time. Rather, we accept that there are clusters of regions of behavior that tend to occur frequently together, and likewise, disappear over time (often reappearing again, though not always).&amp;nbsp; This body of knowledge, specifically related to volatility, is sensibly labeled as heteroscedasticity (differing variance) as opposed to homoscedasticity (constant variance) of observations. We might also notice such behavior being binned and quantified into certain 'regimes' of local stability.&lt;br /&gt;&lt;br /&gt;Now, if any of the above meandering made any sense, I will describe how it relates to the Quantitative Candlestick Pattern Recognition article. Recall that using clustering, we were attempting to identify a vocabulary of states that best describe a limited set of features (in the example, six states were identified) that best partition related candlestick symbols by state in an unsupervised manner. However, the dispersion plot in Fig 1. shows that viewed from a perspective of a central population, these states are not uniformly distributed (IID) over time, rather, some tend to occur frequently over relatively long periods of time, while others appear and disappear for reasonable windows of time.&amp;nbsp; States one and two in the set tend to occur rather frequently, because they are very small moves (dojis and such), which tend to occur often over time. However, some of the larger moves captured in states 3 and 4, tend to persist for some periods, then disappear over other intervals.&amp;nbsp; The likely explanation, is that larger moves tend to be associated with volatility, which as we know, exhibits heteroscedasticity (clustering together in time).  Keep in mind, the dispersion example is not only limited to single symbols over time, but can be extended to any number of n-gram pairs or symbols (such as the two word bigram state for spider man).&lt;br /&gt;&lt;br /&gt;With that knowledge in mind, it doesn't always make a whole lot of sense to try to develop and require a central fixed body of pattern statistics and related models over long periods of time, or even require many related statistical tests as neccessary (things like n-fold cross validation over very large time series, bootstrap re-sampling methods with shuffling, and requiring decades of backtesting training data to obtain confidence that we found the best pattern vocabulary to describe data for all time). For instance, in one of the better books on statistics for traders, "Evidence based TA," by Aronson, many of the tests were conducted using t-tests of entire bodies of financial series and rules over a long period of time, while rejecting many potential pockets of temporal success since they were thrown in and bootstrapped with much longer periods of data to draw conclusions about statistical significance of hypotheses related to better than chance success. &lt;br /&gt;&lt;br /&gt;This is not to say that common trading statistics should be thrown out; not at all. Instead, it is hopefully to try to look at how the information being evaluated is processed over time (for instance, we may look at long term statistics of trade results, but focus more on short term statistics and modelling of the underlying patterns they are dependent upon).&lt;br /&gt;&lt;br /&gt;Additionally, we might be interested in breaking up the pattern information stream into smaller segments and observing and adapting to how the segments of data streams evolve and change over time. The key savior or benefit to us, is that these patterns in the data streams do tend to persist together for quite some time (often reasonably long), before dispersing and moving on to new forms of patterns.&amp;nbsp; There are several different machine learning concepts on the horizon that work with evolving (such as adding and pruning pattern model parameters) data streams over time and space. I have been spending some time evaluating one of them recently (although, I'm not saying which at the moment) which looks promising.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-1639189057449934263?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/1639189057449934263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/08/quantitative-candlestick-pattern.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1639189057449934263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1639189057449934263'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/08/quantitative-candlestick-pattern.html' title='Quantitative Candlestick Pattern Recognition (Part 2 -- What&apos;s this Natural Language Processing stuff?)'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7YSZm5NIAmQ/TFc5yBldJLI/AAAAAAAAAQc/wFl9hwC80VY/s72-c/dsip_plot.jpg' height='72' width='72'/><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-662909035132577507</id><published>2010-07-06T09:59:00.000-07:00</published><updated>2010-07-10T19:20:53.186-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Chaos in the Financial Markets?'/><title type='text'>Chaos in the Financial Markets?</title><content type='html'>Over the years I've had quite a few interested individuals ask me about Chaos and its applications towards trading.  Well, as hidden markov models and speech processing were made popular by James Simons and his team at Renaissance Technologies, one could trace much of the popularity of Chaos theory and its financial applications to Norman Packard and Doyne Farmer, two former physicists working in the area of complex systems.  Much of their story is discussed in the book, 'the Predictors,' by Thomas Bass.  The two were on the forefront of new research in areas of complexity and chaos and had decided to parlay their knowledge into applications towards the financial markets as they founded the company called Prediction Company in Santa Fe, New Mexico.  The company was swallowed by UBS systems in 2005.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/TDNQ5E84hbI/AAAAAAAAAP0/GpQNHROkGTI/s1600/Lorenz.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="301" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/TDNQ5E84hbI/AAAAAAAAAP0/GpQNHROkGTI/s400/Lorenz.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 1.&amp;nbsp;&amp;nbsp; 2D slice of Lorenz Attractor Phase Space signature A.K.A. The butterfly effect.&lt;br /&gt;&lt;br /&gt;We could spend a lot of time discussing various facets of Chaos, as it is a very large field with many different related fields (such as fractals and complexity). But I want to focus on outlining a very simple understanding of why the field seemed so fascinating to traders and quants alike.  Most experts in time series run a slew of tests to demonstrate that markets exhibit no predictable order.&lt;br /&gt;However, what makes Chaos so fascinating is that certain time series may seem to pass a battery of common statistical tests for randomness, yet are perfectly deterministic.&lt;br /&gt;&lt;br /&gt;Chaos is a field of science that is engaged in studying non-linear dynamical behavior of systems. A popular example might be how different planetary bodies exhibit forces upon one another (see Poincare), or turbulent flow of various particles.  Those engaged in studying such systems often like to observe signatures in a domain known as phase space or state space. Rather than look at the time series unfold over time, they are looking for a type of order that underlies the trajectories of the system state dynamics as it unfolds over time.  The plot of the trajectory may show structural order that may appear random in the time domain.  One of the most popular attractor signatures is the well known Lorenz attractor, which is better known to popular literature as the butterfly effect. Fig 1, displayed earlier, shows a 2d slice of the phase space trajectory, that you can run over at  &lt;a href="http://www.cmp.caltech.edu/%7Emcc/Chaos_Course/Lesson1/Demo8.html"&gt;chaos applet&lt;/a&gt;.   The famous signature displays a fascinating case of underlying order that relates to dynamic atmospheric convection in three dimensions.&lt;br /&gt;&lt;br /&gt;A more simple and applicable model that we will look at for illustration purposes, is the well known Logistic Map (also known as quadratic map and Fiegenbaum map). This equation of a non-linear dynamical trajectory was investigated and attributed to Robert May, a biologist studying models of fish populations.&lt;br /&gt;&lt;br /&gt;The recursive equation for the Logistic Map is: xnext = r*x(1-x)&lt;br /&gt;Note that this is a feedback system which has some control over the dynamics of the system model by varying the value r.  What you'll see if you plot it out vs. the control coefficient, r, is that the series moves from a stable system to one which bifurcates into periodic cycles; and as it approaches the value 4 it starts to behave chaotically.  Chaotic behavior is aperiodic, which (like financial series) never repeats exactly; but (unlike financial series) has an underlying deterministic order. In order to see the beauty of chaos and how it exhibits determinism, let's first look at the time series.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TDNRuddhXDI/AAAAAAAAAP8/ePZ5b2mxTOU/s1600/logistic_ts1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="326" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TDNRuddhXDI/AAAAAAAAAP8/ePZ5b2mxTOU/s400/logistic_ts1.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 2. Time Series of Logistic Map Equation.&lt;br /&gt;&lt;br /&gt;Notice the 1st plot, which shows the 1st differenced time series, shows no signs of periodicity or determinism, nor does the cumulative walk display on the 2nd. However, if we look at the phase space signature of the same series in phase space, we see a completely different picture.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TDNSPG1ZZHI/AAAAAAAAAQE/j685M13Ywow/s1600/logistic_map.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="372" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/TDNSPG1ZZHI/AAAAAAAAAQE/j685M13Ywow/s400/logistic_map.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 3. Phase State Plot of Logistic Map&lt;br /&gt;&lt;br /&gt;Notice it is clearly deterministic in this figure. I.e. given any point on the x axis, we can easily determine the exact corresponding point one step into the future on the y axis.&amp;nbsp; This should be fairly obvious, since the equation we started out with xnext=r*x(1-x) = r*x-r*x^2 is a negative parabolic curve. However, many such time series do not have such a simple equation and must be tested in various ways to determine structural chaos. There are other issues to contend with as well, including sensitivity to initial conditions and divergent trajectories due to finite computational precision.&lt;br /&gt;&lt;br /&gt;Ok, now that we understand all the hoopla about Chaos and a seemingly random signal having an underlying deterministic signature, what about a common financial time series?&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TDNTXQ0dh9I/AAAAAAAAAQM/IJJOmOwItvQ/s1600/rw1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="327" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TDNTXQ0dh9I/AAAAAAAAAQM/IJJOmOwItvQ/s400/rw1.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 4. Typical Random Walk (Financial) Time Series.&lt;br /&gt;&lt;br /&gt;The Random Walk shows no discernible order, nor periodicity; similar to the logistic equation series. But what about if we observe the phase space trajectory?&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TDNT4PMPcHI/AAAAAAAAAQU/jCAf8pwc5Gs/s1600/rw_map.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="372" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TDNT4PMPcHI/AAAAAAAAAQU/jCAf8pwc5Gs/s400/rw_map.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 5. Phase Space trajectory of random walk.&lt;br /&gt;&lt;br /&gt;No Dice. Notice that the random walk shows zero determinism, hence the gaussian nature.&amp;nbsp; There are numerous methods to display higher order dimensional metrics as well (correlation lag plots, Lyapunav exponents, etc), and other than something called the compass rose, I've not personally seen much evidence of deterministic chaos in raw financial time series. Incidentally, the phase plot here is equivalent to a lag one scatterplot of returns for those more familiar with finance related statistics.&lt;br /&gt;&lt;br /&gt;A last note on the Prediction Company, is that there is an often referenced paper on the mackey-glass equation &lt;br /&gt;(a non linear dynamic model of blood flow) by Meyer and Packard, whereby they used genetic algorithms to find underlying conditional order rule sets for the series.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;I'll end with perhaps the best excerpt from 'the Predictors,' which echoes much of my own focus and discoveries over the last decade...&lt;br /&gt;&lt;br /&gt;"One of the fundamental truths about the markets is that the dynamics are nonstationary," Norman explains. "We see no evidence for the existence of an attractor with stable statistical properties. This is what characterizes chaos -- having an attractor with stable statistical properties-- so what we are seeing is not chaos. It is something else. Call it an 'even-stranger-than-strange attractor,' which may not really be an attractor at all.&lt;br /&gt;&lt;br /&gt;The market might enter an epoch where some structure coalesces and sits there in a statistically stationary pattern, but then invariably it disappears. You have clouds of structure that coalesce and evaporate, coalesce and evaporate. Prediction Company's job is to find those pieces of structure that have the strongest signal and persist the longest. We want to know when the structure is beginning to emerge or dissolve because, once it begins to dissolve, we want to stop betting on it."&lt;br /&gt;&lt;br /&gt;...excerpt from the Predictors, Thomas A. Bass (1999).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-662909035132577507?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/662909035132577507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/07/chaos-in-financial-markets.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/662909035132577507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/662909035132577507'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/07/chaos-in-financial-markets.html' title='Chaos in the Financial Markets?'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/TDNQ5E84hbI/AAAAAAAAAP0/GpQNHROkGTI/s72-c/Lorenz.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-189906291512029333</id><published>2010-06-10T15:24:00.000-07:00</published><updated>2011-05-22T22:31:55.537-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='and all that)'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantitative Candlestick Pattern Recognition (HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='Baum Welch'/><title type='text'>Quantitative Candlestick Pattern Recognition (HMM, Baum Welch, and all that)</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBHXT4ptMZI/AAAAAAAAAPs/30AkoNSsqB0/s1600/cluster3d.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="347" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBHXT4ptMZI/AAAAAAAAAPs/30AkoNSsqB0/s400/cluster3d.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Fig 1. Clustering based approach to candlestick Pattern Recognition. &lt;br /&gt;&lt;br /&gt;I've been reading a book titled, 'the Quants,' that I'm sure will tantalize many traders with some of the ideas embedded within. Most notably (IMO), the notion that Renaissance's James Simons, hired a battery of cryptographers and speech recognition experts to decipher the code of the markets.  Most notable of the hired experts was, leonard Baum, co-developer of the Baum-Welch algorithm; an algorithm used to model hidden markov models.  Now while I don't plan to divulge full details of my own work in this area; I do want to give a brief example of some methods to possibly apply with respect to these ideas.&lt;br /&gt;&lt;br /&gt;Now most practitioners of classical TA have built up an enormous amount of literature around candlesticks, the japanese symbols used to denote a symbolic formation around open, high, low, and close daily data.  The problem as I see it, is that most of the literature that is available only deals with qualitative recognition of patterns, rather than quantitative.&lt;br /&gt;&lt;br /&gt;We might want to utilize a more quantitative approach to analyzing the information, as it holds much more potential than single closing price data (i.e. the information in each candle contains four dimensions of information).  The question is, how do we do this in a quantitative manner?&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;One method to recognizing patterns that is well known is the supervised method.&amp;nbsp; In supervised learning, we feed the learner a correct list of responses to learn and configure itself from; over many iterations, it comes up with the most optimal configuration to minimize errors between the data it learns to identify, and the data we feed as examples.&amp;nbsp; For instance, we might look at a set of hand-written characters and build a black box to recognize each letter by training via a neural net, support vector machine, or other supervised learning device. However, you probably don't want to spend hours classifying the types of candlesticks by name. Those familiar with candlesticks, might recognize numerous different symbols; shooting star, hammer, doji, etc... Each connotating a unique symbolic harbinger of the future to come. From a quantitative perspective, we might be more interested in understanding the bayesian perspective; I.e. P(upday|hammer)=P(upday,hammer)/P(hammer), for instance.&lt;br /&gt;&lt;br /&gt;But how could we learn the corpus of symbols, without the tedious method of identifying each individual symbol by hand? This is a problem that may better be approached by unsupervised learning. In unsupervised learning, we don't need to train a learner; it finds relationships by itself. Typically, the relationships are established as a function of distance between exemplars.  Please see the data mining text (Witten/Frank) in my recommended list in order to examine the concepts in more detail.&amp;nbsp; In this case I am going to train using a very common unsupervised learner called k-means clustering.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBFgDg82FWI/AAAAAAAAAOc/_Xd3BAs2fl8/s1600/Qoriginal.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="210" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBFgDg82FWI/AAAAAAAAAOc/_Xd3BAs2fl8/s640/Qoriginal.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&amp;nbsp;Fig 2. Graph of arbitrary window of QQQQ data&lt;br /&gt;&lt;br /&gt;Notice the common time ordered candlestick form of plotting is displayed in Fig 2.&amp;nbsp; Now using k-means clustering, with a goal of identifying 6 clusters, I tried to automatically learn 6 unique candlestick forms based on H,L,Cl data relative to Open in this example. The idea being that similar candlestick archetypes will tend to cluster by distance.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TBFhbH7BegI/AAAAAAAAAOk/NqTGxB-X8M0/s1600/Qcluster1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="209" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TBFhbH7BegI/AAAAAAAAAOk/NqTGxB-X8M0/s640/Qcluster1.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 3. Candlestick symbols sorted by 6 Clusters&lt;br /&gt;&lt;br /&gt;Notice in Figure 3, that we can clearly see the k-means clustering approach automatically recognized large red bodies, green bodies, and even more interestingly, there are a preponderance of hammers that were automatically recognized in cluster number 5.&lt;br /&gt;&lt;br /&gt;So given that we have identified a corpus of 6 symbols in our language, of what use might this be? Well, we can take and run a cross tabulation of our symbol states using a program like R.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TBFigk63t8I/AAAAAAAAAOs/W-LlDaJjrDo/s1600/table1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/TBFigk63t8I/AAAAAAAAAOs/W-LlDaJjrDo/s320/table1.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;Fig 4. Cross Tabulation of Clustered States.&lt;br /&gt;&lt;br /&gt;One of the things that strikes me right away is that there are an overwhelming number of pairs with state 1s following state 5; Notice the 57% frequency trounces all other dependent states.&amp;nbsp; Now what is interesting about this? Remember we established that state 5 corresponds to a hammer candlestick? Well, common intuition (at least from my years of reading) expects that a hammer is a turning point that is followed by an up move. Yet, in our table we see it is overwhelmingly followed by state 1, which if you look back, at the sorted by cluster diagram, is a very big red down candlestick. This is completely opposite to what our common body of knowledge and intuition tells us.&lt;br /&gt;&lt;br /&gt;In case it might seem unbelievable to fathom, we can resort the data again, this time in original time order, but with clusters identified.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBFzCPjsHTI/AAAAAAAAAPU/pKXwsXpsZ0o/s1600/Qstate5rev1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="234" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBFzCPjsHTI/AAAAAAAAAPU/pKXwsXpsZ0o/s640/Qstate5rev1.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;Fig 5.&amp;nbsp; Visual Inspection of hammer (state5) likely followed by down candle (state 1)&lt;br /&gt;&lt;br /&gt;We can go back and resort the data and identify the states via the resorted cluster ID staircase level(5), or use labels to more simply identify the case of hammer(5) and its following symbol. Notice that contrary to common knowledge, our automatic recognition process and tabulated probability matrix, found good corroboration with our visual inspection. In the simple window sample (resized to improve visibility), 4 of the 5 instances of the hammer (state 5) were followed by a big red down candle (state 1). Now one other comment to make is that in case the state 5 is not followed by state 1 (say, we bet on expecting a down move), it has a 14.3% chance of landing in state 6 on the next move, which brings our likelihood of a decent sized down move to 71.4% overall.&lt;br /&gt;&lt;br /&gt;We can take these simple quantitative ideas and extend them to MCMC dynamic models, Baum Welch and Viterbi algorithms, and all that sophisticated stuff. Perhaps one day even mimicking the mighty Renaissance itself?&amp;nbsp; I don't know, but any edge we can add to our arsenal will surely help.&lt;br /&gt;&lt;br /&gt;Take some time to read the Quants, if you want a great laymen's view of many related quant approaches.&lt;br /&gt;&lt;br /&gt;&lt;iframe class=" krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo krxnqligpsrrywdbmlmo kwtefmcaqzsjqmgxfbbq dvowwutgnxtgpvuwzicu aqchpnzqofewckihskog aqchpnzqofewckihskog aqchpnzqofewckihskog aqchpnzqofewckihskog" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;amp;bc1=000000&amp;amp;IS2=1&amp;amp;bg1=FFFFFF&amp;amp;fc1=000000&amp;amp;lc1=0000FF&amp;amp;t=ntelligenttra-20&amp;amp;o=1&amp;amp;p=8&amp;amp;l=as1&amp;amp;m=amazon&amp;amp;f=ifr&amp;amp;md=10FE9736YVPPT7A0FBG2&amp;amp;asins=0307453375" style="height: 240px; width: 120px;"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;There may be some bugs in this post, as google just seemed to update their editing platform, and I'm trying to iron out some of the kinks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-189906291512029333?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/189906291512029333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/06/quantitative-candlestick-pattern.html#comment-form' title='25 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/189906291512029333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/189906291512029333'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/06/quantitative-candlestick-pattern.html' title='Quantitative Candlestick Pattern Recognition (HMM, Baum Welch, and all that)'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/TBHXT4ptMZI/AAAAAAAAAPs/30AkoNSsqB0/s72-c/cluster3d.jpg' height='72' width='72'/><thr:total>25</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-1042116335729943724</id><published>2010-05-25T10:41:00.001-07:00</published><updated>2011-04-29T15:06:49.071-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='The Kalman Filter For Financial Time Series'/><title type='text'>The Kalman Filter For Financial Time Series</title><content type='html'>Every now and then I come across a tool that is so bogged down in pages of esoteric mathematical calculations, it becomes difficult to get even a simple grasp of how or why they might be useful. Even worse, you exhaustively search the internet to find a simple picture that might express a thousand equations, but find nothing. The kalman filter is one of those tools. Extremely useful, yet, very difficult to understand conceptually because of the complex mathematical jargon. Below is a simple plot of a kalman filtered version of a random walk (for now, we will use that as an estimate of a financial time series).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S_wNkdp6ClI/AAAAAAAAAOU/hEYy4fWR9Rg/s1600/rw_plot.jpg" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5475266167062530642" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S_wNkdp6ClI/AAAAAAAAAOU/hEYy4fWR9Rg/s400/rw_plot.jpg" style="cursor: pointer; display: block; height: 250px; margin: 0px auto 10px; text-align: center; width: 400px;" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Kalman Filter estimates of mean and covariance of Random Walk&lt;br /&gt;&lt;br /&gt;The kf is a fantastic example of an adaptive model, more specifically, a dynamic linear model, that is able to adapt to an ever changing environment. Unlike a simple moving average or FIR that has a fixed set of windowing parameters, the kalman filter constantly updates the information to produce adaptive filtering on the fly. Although there are a few TA based adaptive filters, such as Kaufman Adaptive Moving Average  and variations of the exponential moving average; neither captures the optimal estimation of the series in the way that the KF does. In the plot in Fig 1. We have a blue line which represents the estimated dynamic 'average' of the underlying time series, where the red line represents the time series itself, and lastly, the dotted lines represent a scaled covariance estimate of the time series against the estimated average. Notice that unlike many other filters, the estimated average is a very good measure of the 'true' moving center of the time series.&lt;br /&gt;&lt;br /&gt;Without diving into too much math, the following is the well known 'state space equation' of the kf:&lt;br /&gt;xt=A*xt-1 + w&lt;br /&gt;zt=H*xt + v&lt;br /&gt;&lt;br /&gt;Although these equations are often expressed in state space or matrix representation, making them somewhat complicated to the layman, if you are familiar with simple linear regression it might make more sense.&lt;br /&gt;Let's define the variables:&lt;br /&gt;xt is the hidden variable that is estimated, in this case it represents the best estimate of the dynamic mean or dynamic center of the time series&lt;br /&gt;A is the state transition matrix, or I often think of it as similar to the autoregressive coefficient in an AR model; think of it as Beta in a linear regression here.&lt;br /&gt;w is the noise of the model.&lt;br /&gt;&lt;br /&gt;So, we can think of the equation of x=Ax-1 + w as being very similar to the basic linear regression model, which it is.  The main difference being that the kf constantly updates the estimates at each iteration in an online fashion.  Those familiar with control systems might understand it as a feedback mechanism, that adjusts for error. Since we can not actually 'see' the true dynamic center in the future, only estimate it, we think of x as a 'hidden' variable.&lt;br /&gt;&lt;br /&gt;The other equation is linked directly to the first.&lt;br /&gt;zt=H*xt+v&lt;br /&gt;zt is the measured noisy state variable that has a probabilistic relationship to x.&lt;br /&gt;xt we recognize as the estimate of the dynamic center of the time series.&lt;br /&gt;v is the noise of the model.&lt;br /&gt;&lt;br /&gt;Again, it is a linear model, but this time the equation contains something we can observe: zt is the value of the time series we are trying to capture and model with respect to xt. More specifically, it is an estimate of the covariance, or co-movement between the observed variable, the time series value, and the estimate of the dynamic variable x. You can also think of the scaled envelope it creates as similar to a standard deviation band that predicts the future variance of the signal with respect to x.&lt;br /&gt;&lt;br /&gt;Those familiar with hidden markov models, might recognize the concept of hidden and observed state variables displayed here.&lt;br /&gt;&lt;br /&gt;Basically, we start out estimating our guess of the the average and covariance of the hidden series based upon measurements of the observable series, which in this case are simply the normal parameters N(mean, std) used to generate the random walk. From there, the linear matrix equations are used to estimate the values of cov x and x, using linear matrix operations. The key is that once an estimate is made, the value of the covariance of x is then checked against the actual observable time series value, y, and a parameter called K is adjusted to update the prior estimates.  Each time K is updated, the value of the estimate of x is updated via:&lt;br /&gt;xt_new_est=xt_est + K*(zt - H*x_est). The value of K generally converges to a stable value, when the underlying series is truly gaussian (as seen in fig 1. during the start of the series, it learns). After a few iterations, the optimal value of K is pretty stable, so the model has learned or adapted to the underlying series. &lt;br /&gt;&lt;br /&gt;Some advantages to the kalman filter are that is is predictive and adaptive, as it looks forward with an estimate of the covariance and mean of the time series one step into the future and unlike a Neural Network, it does NOT require stationary data.&lt;br /&gt;Those working on the Neural Network tutorials, hopefully see a big advantage here.&lt;br /&gt;&lt;br /&gt;It has a very close to smooth representation of the series, while not requiring peeking into the future.&lt;br /&gt;&lt;br /&gt;Disadvantages are that the filter model assumes linear dependencies, and is based upon noise terms that are gaussian generated. As we know, financial markets are not exactly gaussian, since they tend to have fat tails more often than we would expect, non-normal higher moments, and the series exhibit heteroskedasticity clustering. Another more advanced filter that addresses these issues is the particle filter, which uses sampling methods to generate the underlying distribution parameters.&lt;br /&gt;&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;Here are some references which may further help in understanding of the kalman filter.&lt;br /&gt;In addition, there is a kalman smoother in the R package, DLM.&lt;br /&gt;&lt;br /&gt;http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html&lt;br /&gt;&lt;br /&gt;If you are interested in a Python based approach, I highly recommend the following book...Machine Learning An Algorithmic Perspective&lt;br /&gt;&lt;br /&gt;&lt;iframe frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://rcm.amazon.com/e/cm?lt1=_blank&amp;amp;bc1=000000&amp;amp;IS2=1&amp;amp;bg1=FFFFFF&amp;amp;fc1=000000&amp;amp;lc1=0000FF&amp;amp;t=ntelligenttra-20&amp;amp;o=1&amp;amp;p=8&amp;amp;l=as1&amp;amp;m=amazon&amp;amp;f=ifr&amp;amp;md=10FE9736YVPPT7A0FBG2&amp;amp;asins=1420067184" style="height: 240px; width: 120px;"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Not only is there a fantastic writeup on hidden markov models and kalman filters, but there is real code you can replicate. It is one of the best practical books on Machine Learning I have come across-- period.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-1042116335729943724?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/1042116335729943724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/05/kalman-filter-for-financial-time-series.html#comment-form' title='38 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1042116335729943724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1042116335729943724'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/05/kalman-filter-for-financial-time-series.html' title='The Kalman Filter For Financial Time Series'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S_wNkdp6ClI/AAAAAAAAAOU/hEYy4fWR9Rg/s72-c/rw_plot.jpg' height='72' width='72'/><thr:total>38</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3500610601513558481</id><published>2010-05-12T22:04:00.000-07:00</published><updated>2010-05-13T00:03:24.070-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Is it possible to get a causal smoothed filter ?'/><title type='text'>Is it possible to get a causal smoothed filter ?</title><content type='html'>Although I haven't been all that much of a fan of moving average based methods, I've observed some discussions and made some attempts to determine if it's possible to get an actual smoothed filter with a causal model.  Anyone who's worked on financial time series filters knows that the bane of filtering is getting a smooth response with very low delay.  Ironically, one would think that you need a very small moving average length to accomplish a causal filter with decent lag properties; often a sacrifice is made between choosing a large parameter to obtain decent smoothing at the cost of lag.&lt;br /&gt;&lt;br /&gt;I put together the following FIR based filter using QQQQ daily data for about 1 year worth of data.  It is completely causal and described by .. gasp.. 250 coefficients.&lt;br /&gt;&lt;br /&gt;Does it appear smooth? You decide.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S-uJTii9Y4I/AAAAAAAAAOE/jpaCzr4Z5vo/s1600/causal1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 271px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S-uJTii9Y4I/AAAAAAAAAOE/jpaCzr4Z5vo/s400/causal1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5470617141155554178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. FIR 250 tap feed forward filter&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S-uJfRBhm3I/AAAAAAAAAOM/p34jPC2Otgk/s1600/impulse.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 246px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S-uJfRBhm3I/AAAAAAAAAOM/p34jPC2Otgk/s400/impulse.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5470617342610348914" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. 250 weight impulse response determining coefficients&lt;br /&gt;&lt;br /&gt;The impulse response is approximately a sinc function, which is the discrete inverse transform for an ideal 'brick wall' low pass filter.&lt;br /&gt;&lt;br /&gt;I haven't actually verified much out of sample at the moment, so it's quite possible that the model may not fare as well; it remains to be investigated. However, thought I would share this work to give some ideas about potential of causal filtering methods.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3500610601513558481?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3500610601513558481/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/05/is-it-possible-to-get-causal-smoothed.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3500610601513558481'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3500610601513558481'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/05/is-it-possible-to-get-causal-smoothed.html' title='Is it possible to get a causal smoothed filter ?'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7YSZm5NIAmQ/S-uJTii9Y4I/AAAAAAAAAOE/jpaCzr4Z5vo/s72-c/causal1.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-5031119327653236157</id><published>2010-04-28T18:50:00.000-07:00</published><updated>2010-04-29T14:23:02.867-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wavelet Spectrogram Non-Stationary Financial Time Series analysis using R (TTR/Quantmod/dPlR) with USDEUR'/><title type='text'>Wavelet Spectrogram Non-Stationary Financial Time Series analysis using R (TTR/Quantmod/dPlR) with USDEUR</title><content type='html'>I've been doing some research lately regarding types of spectral imaging and decomposition techniques that apply to non-stationary signals. As mentioned earlier, one of the major problems with the simple fourier analysis is that the basis functions extend to infinity in both directions and the signals are assumed to be stationary.  Although, I won't expand too much right now, one of the advantages of wavelets is that they use local small windowed basis functions, allowing them to capture not only non-stationary signals, but signals that are aperiodic: two large advantages over fourier based methods when dealing with financial time series.&lt;br /&gt;&lt;br /&gt;I put together a few small examples to understand how to visually understand a spectrogram. &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jmv-Zq55I/AAAAAAAAANk/x-ouTUAWIBs/s1600/58_cycle1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 327px; height: 400px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jmv-Zq55I/AAAAAAAAANk/x-ouTUAWIBs/s400/58_cycle1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5465371859693004690" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Simple 58 day cycle captured with 11 octaves and 2048 (2^11) data points&lt;br /&gt;&lt;br /&gt;As in earlier tutorial based posts, we use a simple 58 day cycle to show the basic time series sine based waveform.  Now the plot on the bottom is known as a spectrogram.  The type of wavelet operation for this spectrogram is known as a continuous wave Morlet transform. The package is dpLR (The Dendrochronology Program Library) put together by &lt;a href="http://myweb.facstaff.wwu.edu/bunna/dplR.htm"&gt;  Andy Bunn &lt;/a&gt;.  The package was designed to analyze tree rings. Notice that there are a multitude of tools utilizing this type of technology, ranging from MRIs, to climatology, to speech processing. It is, IMO, the modern day version of dft type spectral tools (however, for non-stationary and aperiodic signals). Now looking on the spectrogram plot, please keep in mind the units are Days not Years (I need to see how to alter that, hopefully Dr. Bunn is listening=). &lt;br /&gt;&lt;br /&gt;The time scales represents linear time, or a window of 2048 days that was sampled. We could have used any time series, but it needs to be length=2^N; if not, there is a function to pad the rest of the data with zeros to make up that length. The vertical scale is a log scale that shows what are called 'octaves'. Borrowing from musical vernacular, we can think of them of scales which double in magnitude for every prior scale and represent localized frequency energy information at such scales. The colors represent the heat or power of the signal in regions of interest. Due to some issues with this transform, we ignore uncertain information outside of the dark parabolic region (cone of influence).  It is clear that the highest power is the dark red region right at around 58 days.  What is important here is not so much to understand the exact value of the cycle, but the persistence in the dominant cycle (s).  We notice the cycle persists throughout the entire spectrogram Time Series length (much as we would expect from the 2D time series plot).&lt;br /&gt;&lt;br /&gt;What happens if we use different frequencies that change over time? Here we notice a clear advantage over fourier based methods. A fourier based decomposition would be able to locate the dominant tones, however, because it uses infinite bases, the reconstructed signal would not capture the isolation of different frequencies.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jqCcqgKZI/AAAAAAAAANs/ijHl68CaK40/s1600/composite_cycles1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 327px; height: 400px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jqCcqgKZI/AAAAAAAAANs/ijHl68CaK40/s400/composite_cycles1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5465375475589196178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Composite Stationary Time Series comprised of 3 dominant tones&lt;br /&gt;&lt;br /&gt;Notice, that we can clearly see the regions of dominant tones by following the chart and looking for the most concentrated power (red) regions, which are around 48, 253, and 532 day cycles. We also notice that the power density can be viewed in terms of time context, our eyes simply follow along in time and observe strong regions of signal energy concentration.&lt;br /&gt;&lt;br /&gt;Ok, but what about if the signal itself is non-stationary?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S9jqsymjGsI/AAAAAAAAAN0/Q6sj2etC1ps/s1600/composite_exp3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 327px; height: 400px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S9jqsymjGsI/AAAAAAAAAN0/Q6sj2etC1ps/s400/composite_exp3.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5465376203032697538" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Composite signal added to exponential curve to make signal non-stationary&lt;br /&gt;&lt;br /&gt;Notice, that even though we now have a non-stationary signal, the regions of underlying cyclic component stability are still detectable by eye!&lt;br /&gt;&lt;br /&gt;Lastly, a financial time series of USDEUR was captured via TTR/Quantmod packages.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jrJbZALDI/AAAAAAAAAN8/WGFSebFEcak/s1600/usdeur1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 327px; height: 400px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jrJbZALDI/AAAAAAAAAN8/WGFSebFEcak/s400/usdeur1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5465376695018073138" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 4. USDEUR time series spectrogram&lt;br /&gt;&lt;br /&gt;Notice even with the non-stationary financial signal, there is a very clear dominant cycle pattern that is persistent at roughly 255 days (anyone familiar with trading recognizes that as the approximate number of trading days per year).&lt;br /&gt;&lt;br /&gt;Keep in mind that there are also aliases (and spreading) present in sampling methods which may look like periodic signals, but are merely digital artifacts of the underlying sampled signal. We also see the very short term noise present in the bottom lower scales.&lt;br /&gt;&lt;br /&gt;Another interesting application of this is that it may not only be used as a modern tool to augment non-stationary decomposition, but for those familiar with pattern based techniques, it (and the periodogram counterpart) is often used in pattern recognition and markov type modeling.&lt;br /&gt;&lt;br /&gt;That's all for now. Hopefully, you have gained some appreciation for wavelet based spectral techniques vs. Fourier spectral based analysis.  &lt;br /&gt;&lt;br /&gt;I have been debating whether to break up the post, but because I was added to the R bloggers thread, I wanted the post to be complete for local readers.&lt;br /&gt;&lt;br /&gt;That's it for now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-5031119327653236157?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/5031119327653236157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/04/wavelet-spectograph-analysis-using-r.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5031119327653236157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5031119327653236157'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/04/wavelet-spectograph-analysis-using-r.html' title='Wavelet Spectrogram Non-Stationary Financial Time Series analysis using R (TTR/Quantmod/dPlR) with USDEUR'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/S9jmv-Zq55I/AAAAAAAAANk/x-ouTUAWIBs/s72-c/58_cycle1.jpg' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8469885896684368893</id><published>2010-04-03T09:32:00.000-07:00</published><updated>2010-04-23T14:25:36.920-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2'/><title type='text'>Why isn't my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2</title><content type='html'>I created an example to show how the theory from part 1 might be applied using S&amp;P500 as a proxy for performance.  Just in case anyone viewing is not familiar with terminal wealth, it is the final (usually compounded) ending value (hence, terminal) of the account.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7dt1xfpM6I/AAAAAAAAANc/0jEiQfz3v48/s1600/example_SP500.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 352px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7dt1xfpM6I/AAAAAAAAANc/0jEiQfz3v48/s400/example_SP500.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5455950244169200546" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Example of S&amp;P 500 and using GBM monte carlo simulations for terminal wealth&lt;br /&gt;&lt;br /&gt;A monte carlo simulation of GBM, using historical daily%change parameters(mean,std), was run for 10,000 iterations of a time series length=1000. The length was chosen to approximate slices of about 3yrs for summary statistics of terminal wealth (a good approximation for market timing).  I also used the long term historical mu and std of the series, although it might be a bit biased towards longer horizons. Possibly, I could generate more of a 3yr sampling distribution of N(u,std), for more relevance, but for now we'll assume the long run parameters are a good approximation.&lt;br /&gt;&lt;br /&gt;Graphical summary statistics using boxplots and density estimates are shown for the monte carlo simulations. What strikes me at first glance, is that the -2x instrument performs absolutely horrible in most cases, adding to the common knowledge that markets have upwards drift. If you are ever stuck holding a position, just hope it isn't short (we've all experienced the deer in the headlight phenomenon at one time or another); statistically, it is not the best side to be stuck on for any long period.  &lt;br /&gt;&lt;br /&gt;Another more interesting observation, however, is that the simple 1X underlying instrument mode is to the right of all the density estimates. In addition, you are clearly taking on wider variance/risk, by using the positive (and neg) leveraged 2x instrument. In essence, you are seeing some of kelly principles at work here. By taking on 2X risk, while you have a chance of larger gains, statistically, you are not likely to do too much better than 1x, while taking on far greater risk on the negative side.&lt;br /&gt;&lt;br /&gt;Lastly, there are two sample slices shown of the actual results, using arbitrary periods of performance.  It is clear, that during periods of long trends, we have much better growth in the 2X instrument, unfortunately, we don't know when those trends will occur, and secondly, according to the monte carlo sims, they are not that likely to occur.&lt;br /&gt;&lt;br /&gt;The most recent performance, displayed, is a perfect example of a series where both 2X instruments performed worse than the underlying, as explained in part 1.&lt;br /&gt;&lt;br /&gt;Below is a summary of the three series, ser(1X), ser2pos(+2X), ser2neg(-2X)&lt;br /&gt;&gt; summary(ser)&lt;br /&gt;   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. &lt;br /&gt; 0.3613  1.0800  1.3290  1.3870  1.6250  4.7460 &lt;br /&gt;&gt; summary(ser2pos)&lt;br /&gt;   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. &lt;br /&gt; 0.1178  1.0630  1.6100  1.9180  2.4070 20.5700 &lt;br /&gt;&gt; summary(ser2neg)&lt;br /&gt;   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. &lt;br /&gt; 0.0337  0.2859  0.4279  0.5173  0.6483  5.6480 &lt;br /&gt;&lt;br /&gt;Notice the Median of +2X is nowhere near 2 times the Median of the underlying. Although 2X has some fantastic outliers, you shouldn't expect them statistically.&lt;br /&gt;It's sort of like tossing a coin with compounding the full amount, whereby, you get a fantastic result for the winning outcome, unfortunately, there is a 75% probability of going bankrupt (maybe I'll cover that one another time).&lt;br /&gt;&lt;br /&gt;One final comment is that the monte carlo sims used GBM, whereas a more likely jump diffusion process would create much fatter tails, meaning even more neg tail risk against the potentially nice looking 2X instrument potential gains.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8469885896684368893?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8469885896684368893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/04/why-isnt-my-2x-ultra-etf-keeping-pace.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8469885896684368893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8469885896684368893'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/04/why-isnt-my-2x-ultra-etf-keeping-pace.html' title='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)? Part 2'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7dt1xfpM6I/AAAAAAAAANc/0jEiQfz3v48/s72-c/example_SP500.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3049070481141247689</id><published>2010-03-31T07:39:00.000-07:00</published><updated>2010-04-23T14:26:06.450-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?'/><title type='text'>Why isn't my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?</title><content type='html'>I've been reading a few articles lately, lambasting ultra ETFs for not keeping up with markets and ascribing the problem to weird unexplainable reasons such as portfolio derivative re-balancing and negative drift.  I thought it would be nice to revisit the concept of path asymmetry.  Although there are many different definitions of price asymmetry (econometrics for example), in this case I'm simply referring to the asymmetrical nature of percentage price movements vs dollar movements and their final cumulative outcome given any arbitrary path.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7NkK735ZVI/AAAAAAAAANU/8Tbzm2Qr-w8/s1600/path_ex.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 326px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7NkK735ZVI/AAAAAAAAANU/8Tbzm2Qr-w8/s400/path_ex.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5454813712709412178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Example of ultra 2X ETFs and path asymmetry&lt;br /&gt;&lt;br /&gt;Many people seem to find it incomprehensible (if not reprehensible) that an underlying series may move a certain direction, yet, both the ultra short and ultra long series both finish below the underlying over the long run.  What exactly is path asymmetry?  Some traders might be familiar with the notion that if you lose some percentage of your account, like 50%, that you need more than 50% to make up for the loss. That is an example of path asymmetry (I should note someone also mentioned it's an example of Seigel's paradox).&lt;br /&gt;&lt;br /&gt;Let's look at a very simple example of how this might affect a stock and it's 2x counterparts.  Suppose a stock moves from 100dollars to 80 and back to 100 again-- break-even. The move from 100 to 80 on a percentage basis, was a 20% loss. However, to recoup that amount, we need to solve for 80*(1+x)=100; the answer is 25%, not 20%. This means even though the dollar amount is identical for both moves (20dollars down and up), the %amount is not.  That is an example of path asymmetry. How does this affect the  2X ultra Leveraged ETFs?  Well, since each ETF is designed to track twice the daily move of the underlying, the the +2x ETF will move 40% down, then it will move 50% back up, for a net dollar ending value of 90 dollars.  The -2x ETF will move up 2x or 40% to 140, and then retrace -50% leaving it at only 70 dollars.  Notice in both cases, each ultra ETF ends up below the underlying price.  It is the simple mechanics of path dependency and asymmetry that account for this, even with perfect 2x leveraging.  It is important to take into account path dependencies when dealing with any leveraged product, including hedging.&lt;br /&gt;&lt;br /&gt;Now keep in mind, there is additional drag on these products, due to fund expenses, which does add merit to the original question.  More on this is explained succinctly in this article by &lt;a href="http://seekingalpha.com/article/35789-the-case-against-leveraged-etfs"&gt; Alpha's Tristan Yates and Lye Kok &lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3049070481141247689?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3049070481141247689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/why-isnt-my-2x-etf-keeping-pace-with.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3049070481141247689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3049070481141247689'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/why-isnt-my-2x-etf-keeping-pace-with.html' title='Why isn&apos;t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S7NkK735ZVI/AAAAAAAAANU/8Tbzm2Qr-w8/s72-c/path_ex.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-2962267645188864028</id><published>2010-03-24T16:39:00.001-07:00</published><updated>2010-03-24T21:26:17.799-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Modified Donchian Band Trend Follower using R'/><category scheme='http://www.blogger.com/atom/ns#' term='TTR  -Part 2: Parameter Sweep Sensitivity over long run'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantmod'/><title type='text'>Modified Donchian Band Trend Follower using R, Quantmod, TTR  -Part 2: Parameter Sweep Sensitivity over long run</title><content type='html'>Here is a small update to the Donchian Channel type system I displayed in the last post.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S6qi1IauHlI/AAAAAAAAAM8/nbpzqEhUETs/s1600/sens_sweepGSPC.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 339px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S6qi1IauHlI/AAAAAAAAAM8/nbpzqEhUETs/s400/sens_sweepGSPC.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5452349332561731154" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Sensitivity of Net Combined L/S Gain to parameter n.&lt;br /&gt;&lt;br /&gt;Using the S&amp;P500 index as a proxy for the market, a simulation was run over the lifetime of the index. Notice the system excels in both the very short run, and much longer periods. The short system did very poorly overall and did not perform nowhere near the long side in any of the overall periods (except maybe very short term). A possible explanation is that short side systems do not do very well in the long run due to upward drift of markets. In addition, short side runs do not have the inherent compounding power of long sides as they are asymmetrical. The most you gain on a short run is double your original value, where the long side is unlimited (one way around this limitation is using inverse ETFs). I believe many common simulators err in the effects and method of this computation.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S6qkmJuBIFI/AAAAAAAAANE/ijRTnfgd6gk/s1600/long_term.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 387px; height: 400px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S6qkmJuBIFI/AAAAAAAAANE/ijRTnfgd6gk/s400/long_term.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5452351274236321874" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Some long term results of strategy with parameter n=140&lt;br /&gt;&lt;br /&gt;The above figure shows the results of choosing a parameter near the optimal region. In light of commissions and limited short strategy performance over longer periods, it might pay to use the long only portion of the strategy. Another observation is to possibly step aside during highly volatile regions in order to capture the beneficial areas of the long strategy. Some of the methods to approach this type of regime switching have been mentioned in earlier posts.&lt;br /&gt;&lt;br /&gt;One last comment to think about when hearing detractors regarding 'curve fitting' and optimization, is that as evidenced in the above simulation, you will often find the the local optimal parameter value turns out to be the most robust, as it will perform best over a wide range of sensitivity to parametrization.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-2962267645188864028?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/2962267645188864028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/modified-donchian-band-trend-follower.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2962267645188864028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2962267645188864028'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/modified-donchian-band-trend-follower.html' title='Modified Donchian Band Trend Follower using R, Quantmod, TTR  -Part 2: Parameter Sweep Sensitivity over long run'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7YSZm5NIAmQ/S6qi1IauHlI/AAAAAAAAAM8/nbpzqEhUETs/s72-c/sens_sweepGSPC.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-5446265112613054225</id><published>2010-03-12T12:47:00.000-08:00</published><updated>2010-03-12T16:18:20.747-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Modified Donchian Band Trend Follower using R'/><category scheme='http://www.blogger.com/atom/ns#' term='TTR'/><category scheme='http://www.blogger.com/atom/ns#' term='Quantmod'/><title type='text'>Modified Donchian Band Trend Follower using R, Quantmod, TTR</title><content type='html'>I've been toying around with the examples given on the &lt;a href="http://blog.fosstrading.com/2009/04/testing-rsi2-with-r.html"&gt; FOSS trading site &lt;/a&gt; for some of the great work they've put together in the Quantmod and TTR packages.  Those viewers who are looking for a nice (and free) backtesting suite to possibly complement some of your other results or work in say, Weka, should familiarize yourselves with R.  Not only can it serve as a canvas to simulate ideas and concepts, but can process the backend results towards more trader oriented metrics, than using something like Weka as a standalone tool. As you gain more proficiency in data mining and machine learning concepts in Weka, you can also make the move to integrate the tools inside of R, as R contains the majority of machine learning schemes inside of various packages.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S5qpA5o2XAI/AAAAAAAAAMs/-IY4av-T11Y/s1600-h/donch_ex.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 361px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S5qpA5o2XAI/AAAAAAAAAMs/-IY4av-T11Y/s400/donch_ex.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5447852532195286018" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Modified Donchian Channel System Simulation&lt;br /&gt;&lt;br /&gt;As an example of how to use some of their tools (along with traditional R packages) for fast prototyping, I put together an example of a modified Donchian Channel trend following system along with how you might simulate it using R. The typical Donchian Channel Bands are used as breakout entry and exit signals.  I.e. once an n period high has been breached you go long, then exit when the n period low has been breached -- visa versa for short. In this example, however, we simply enter long on the average line break and stay long as long as it is above. A short signal is entered on the average line break to the downside. Unlike a price/moving average type system, there wasn't a lot of choppiness causing false starts around the average line, which is a plus.&lt;br /&gt;&lt;br /&gt;I am still trying to familiarize myself more with the tools, and am still at the point where I like how simple and fast the static vector computations work (similar to numpy in Python), but I am wondering how fast more sophisticated entry/exits requiring loops will work.  I still expect to work on some of these types of scenarios, as I am really enjoying the capabilities of R along with some of these trading oriented packages.&lt;br /&gt;&lt;br /&gt;Although the system (using QQQQ as an example) is in no way optimized nor analyzed for robustness, it returned a respectable 60% versus a buy and hold loss over the past roughly two years (showing a simple example of trend type trading).&lt;br /&gt;&lt;br /&gt;Here is the complete code for you to replicate (I used the R version 2.7.10.1).&lt;br /&gt;Note: if some of it looks familiar to the FOSS RSI example, it is exactly because I used that example as a starting point, so there will be some overlap in comments and actions.&lt;br /&gt;&lt;br /&gt;# We will need the quantmod package for charting and pulling&lt;br /&gt;# data and the TTR package to calculate Donchian Bands.&lt;br /&gt;# You can install packages via: install.packages("packageName")&lt;br /&gt;# install.packages(c("quantmod","TTR"))&lt;br /&gt;# See Foss Trading Blog for RSI template&lt;br /&gt;library(quantmod)&lt;br /&gt;library(TTR)&lt;br /&gt;&lt;br /&gt;tckr&lt;-"QQQQ"&lt;br /&gt;tckr_obj&lt;-QQQQ&lt;br /&gt;&lt;br /&gt;start&lt;-"2008-01-01"&lt;br /&gt;end&lt;- "2010-03-08"&lt;br /&gt;&lt;br /&gt;# Pull tckr index data from Yahoo! Finance&lt;br /&gt;getSymbols(tckr, from=start, to=end)&lt;br /&gt;QQQQ.cl&lt;-QQQQ[,6]&lt;br /&gt;QQQQ.H&lt;-QQQQ[,2]&lt;br /&gt;QQQQ.L&lt;-QQQQ[,3]&lt;br /&gt;dc&lt;-DonchianChannel(cbind(QQQQ.H,QQQQ.L),n=80)&lt;br /&gt;&lt;br /&gt;#Plotting Donchian Channel&lt;br /&gt;ymin=25&lt;br /&gt;ymax=55&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;par(mfrow=c(2,2), oma=c(2,2,2,2))&lt;br /&gt;&lt;br /&gt;# max, avg, min &lt;- red, blue, green&lt;br /&gt;plot(dc[,1],col="red",ylim=c(ymin,ymax),main="")&lt;br /&gt;par(new=T)&lt;br /&gt;plot(dc[,2],col="blue",ylim=c(ymin,ymax),main="")&lt;br /&gt;par(new=T)&lt;br /&gt;plot(dc[,3],col="green",ylim=c(ymin,ymax),main="")&lt;br /&gt;par(new=T)&lt;br /&gt;plot(QQQQ.cl,ylim=c(ymin,ymax),pch=15,main="donchian bands max/avg/min")&lt;br /&gt;lines(QQQQ.cl,ylim(ymin,ymax))&lt;br /&gt;###################################################&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Create the long (up) and short (dn) signals&lt;br /&gt;sigup &lt;-ifelse(QQQQ.cl &gt; dc[,2],1,0)&lt;br /&gt;sigdn &lt;-ifelse(QQQQ.cl &lt; dc[,2],-1,0)&lt;br /&gt;&lt;br /&gt;# Lag signals to align with days in market,&lt;br /&gt;# not days signals were generated&lt;br /&gt;sigup &lt;- lag(sigup,1) # Note k=1 implies a move *forward*&lt;br /&gt;sigdn &lt;- lag(sigdn,1) # Note k=1 implies a move *forward*&lt;br /&gt;&lt;br /&gt;# Replace missing signals with no position&lt;br /&gt;# (generally just at beginning of series)&lt;br /&gt;sigup[is.na(sigup)] &lt;- 0&lt;br /&gt;sigdn[is.na(sigdn)] &lt;- 0&lt;br /&gt;&lt;br /&gt;# Combine both signals into one vector&lt;br /&gt;sig &lt;- sigup + sigdn&lt;br /&gt;&lt;br /&gt;# Calculate Close-to-Close returns&lt;br /&gt;ret &lt;- ROC(tckr_obj[,6])&lt;br /&gt;ret[1] &lt;- 0&lt;br /&gt;&lt;br /&gt;# Calculate equity curves&lt;br /&gt;eq_up &lt;- cumprod(1+ret*sigup)&lt;br /&gt;eq_dn &lt;- cumprod(1+ret*sigdn)&lt;br /&gt;eq_all &lt;- cumprod(1+ret*sig)&lt;br /&gt;&lt;br /&gt;#graphics&lt;br /&gt;mfg=c(1,2)&lt;br /&gt;plot(eq_up,ylab="Long",col="green")&lt;br /&gt;mfg=c(2,2)&lt;br /&gt;plot(eq_all,ylab="Combined",col="blue",main="combined L/S equity")&lt;br /&gt;mfg=c(2,1)&lt;br /&gt;plot(eq_dn,ylab="Short",col="red")&lt;br /&gt;title("Modified Donchian Band Trend Following System (intelligenttradingtech.blogspot.com)", outer = TRUE)&lt;br /&gt;&lt;br /&gt;##############################################################################################################&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. As always, please use your own due diligence in all work borrowed from this site. There are some areas that I believe are not quite correct in the simulation framework, needless to say, you have a complete script to start your own examples and backtesting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-5446265112613054225?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/5446265112613054225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/modified-dochian-band-trend-follower.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5446265112613054225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/5446265112613054225'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/03/modified-dochian-band-trend-follower.html' title='Modified Donchian Band Trend Follower using R, Quantmod, TTR'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S5qpA5o2XAI/AAAAAAAAAMs/-IY4av-T11Y/s72-c/donch_ex.jpg' height='72' width='72'/><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-2834571804692562755</id><published>2010-02-24T13:43:00.000-08:00</published><updated>2010-02-24T17:37:09.964-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FFT (Fast Fourier Transform) of time series  -- promises and pitfalls towards trading'/><title type='text'>FFT (Fast Fourier Transform) of time series  -- promises and pitfalls towards trading</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4WeP-qxBqI/AAAAAAAAAMk/TC_VbXaHexk/s1600-h/fft_ex.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 237px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4WeP-qxBqI/AAAAAAAAAMk/TC_VbXaHexk/s400/fft_ex.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5441929722104710818" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. FFT transformed time series (EBAY) reconstructed with first three and twenty harmonics, respectively.&lt;br /&gt;&lt;br /&gt;I see quite a few traders interested in advanced signal processing techniques. It is often instructive to see why they may or may not be useful.  The concept behind fourier analysis is that any periodic signal can be broken down into a taylor series or sum of suitably scaled sine and cosine waveforms (even a square wave!).  The key requirement is that the signals are periodic, which means that they repeat forwards and backwards to plus and minus infinity.  Anyone who deals with financial series knows they are aperiodic (meaning they do not repeat indefinitely).  The FFT, or fast fourier transform is an algorithm that essentially uses convolution techniques to efficiently find the magnitude and location of the tones that make up the signal of interest.  We can often play with the FFT spectrum, by adding and removing successive tones (which is akin to selectively filtering particular tones that make up the signal), in order to obtain a smoothed version of the underlying signal.  &lt;br /&gt;&lt;br /&gt;In the posted example, I showed the effect of reconstructing the transformed waveform using only the first three tones (and cutting off all others), where we see a low passed version of the signal.  The second example includes the first 20 tones, which begins to match the signal more closely, but is a smoothed representation of the signal, which is often a nice representation to isolate smoothed signal component from high frequency noise. Notice the terms tones and harmonics are practically synonymous for purposes of this discussion (a harmonic is more specifically a multiple of the fundamental tone); both represent the spectral frequency components that sum up to make the total waveform.  The major problem that I wanted to illustrate with this simple example (among many), is the problem of 'wraparound effects.'  As I mentioned earlier, one of the requirements for properly applying a fourier transform is that the signal is periodic or repeating, since the basis functions (sines and cosines) that are convolved are infinitely repeating functions.&lt;br /&gt;&lt;br /&gt;With that requirement, the reconstructed waveform tries its best to match the beginning and endpoints for periodic repetition. The result is severe problems at the endpoints (left and right), which are often the points we are most concerned about.  So it often pays to be cautious when hearing about applications of higher level signal processing techniques.  There are several other requirements and limitations to applying FFT techniques, among them: requirement of 2^n samples, fsample must be greater than or equal to twice the max bandwidth of sampled signal (nyquist criterion), limited spectral tone bin resolution; ignoring any of these issues can cause severe reconstruction errors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-2834571804692562755?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/2834571804692562755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/fft-of-time-series-promises-and.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2834571804692562755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2834571804692562755'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/fft-of-time-series-promises-and.html' title='FFT (Fast Fourier Transform) of time series  -- promises and pitfalls towards trading'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4WeP-qxBqI/AAAAAAAAAMk/TC_VbXaHexk/s72-c/fft_ex.jpg' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-6879715063715346199</id><published>2010-02-22T16:59:00.000-08:00</published><updated>2010-02-22T17:16:48.911-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Time Series Calendar Heat Maps Using R'/><title type='text'>Time Series Calendar Heat Maps Using R</title><content type='html'>I came across an interesting blog that showcased &lt;a href=http://blog.revolution-computing.com/2009/11/charting-time-series-as-calendar-heat-maps-in-r.html&gt;Charting time series as calendar heat maps in R &lt;/a&gt;.  It is based upon a great algorithm created by Paul Bleicher,CMO of Humedica.  I'll let you link to the other blog to see more details on the background and original source code.&lt;br /&gt;&lt;br /&gt;I made a very small modification to allow %daily changes, rather than price values.&lt;br /&gt;&lt;code&gt;&lt;br /&gt;stock.dailychange&lt;-100*(diff(stock.data$Adj.Close,lag=1)/y[1:length(stock.data$Adj.Close)-1])&lt;br /&gt;calendarHeat(stock.data$Date[1:length(stock.data$Date)-1], stock.dailychange, varname="SPY daily % changes(CL-CL)")&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4Mpbx0NkJI/AAAAAAAAAMc/tVZEzfrfSKU/s1600-h/spy_ex.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 371px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4Mpbx0NkJI/AAAAAAAAAMc/tVZEzfrfSKU/s400/spy_ex.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5441238331999228050" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Calendar Heat Map for SPY time series 2005-Present&lt;br /&gt;&lt;br /&gt;What's interesting is you can see how unusual events tend to Cluster (heteroscedasticity) , and a preponderance of low change days (as would be expected in close to Gaussian distributions).  Using the regions of clustering might help warn of impeding catastrophic regimes (as seen in late 08), similar to using VIX as a proxy. In addition, the 10,000 foot bird's eye view, might allow you to spot pockets of order for further evaluation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-6879715063715346199?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/6879715063715346199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/time-series-calendar-heat-maps-using-r.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/6879715063715346199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/6879715063715346199'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/time-series-calendar-heat-maps-using-r.html' title='Time Series Calendar Heat Maps Using R'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4Mpbx0NkJI/AAAAAAAAAMc/tVZEzfrfSKU/s72-c/spy_ex.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-6078877665892275814</id><published>2010-02-20T09:41:00.000-08:00</published><updated>2010-02-25T15:07:11.643-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genetic Algorithm Systematic Trading Development -- Part 3  (Python/VBA)'/><title type='text'>Genetic Algorithm Systematic Trading Development -- Part 3  (Python/VBA)</title><content type='html'>As mentioned in prior posts, it is not possible to use the standard Weka GUI to instantiate a Genetic Algorithm, other than for feature selection.  Part of the reason is that there is no generic algorithm to instantiate a fitness function.  The same flexibility that allows an infinite possible range of fitnesses also requires custom scripting. Although it is possible to write a custom class for Weka/JAVA, I chose to utilize Python for this example, along with an older VBA tool I developed for the back-end results summary.  Hopefully, you'll see that there are many tools that may be utilized to prototype various systems and augment the development process. &lt;br /&gt;&lt;br /&gt;The essential GA uses a 17 bit string length to encode the following rule:&lt;br /&gt;&lt;br /&gt;{if ma(m) binop ma(n) then buy}&lt;br /&gt;&lt;br /&gt;The first 8 bits are used to encode the 1st ma value. Note there are 2^n = 2^8 = 256 potential decimal values that can be used for the parameter argument. The 9th bit is a 2 bit encoded value of the &gt; or &lt; binary operator as discussed in prior posts.  The last 8 bits are used for the 2nd moving average parameter value. A simple fitness of the net dollar return was used for this example (Note Sharpe ratio, and other fitness metrics could have been used). The input series is SPY, using the range from 1993-2005 daily to optimize.&lt;br /&gt;&lt;br /&gt;The python script was essentially set up to run 40 generations of a population of size 20 using elitism and tournament selection. Although this is by no means optimal (it is quite small), it was set up using these values for illustrative purposes.  When you watch the video, what you'll see is the initial population in binary encoded strings each time a generation is passed. In addition, the decoded moving average rule is shown for each selection change.  Although the video has been truncated for brevity, you should notice that the fitness number is improving each generation.  The final solution was designed to halt after a fitness did not improve over five generations.  In addition, you can see the final encoded result and a plot of the fitness convergence.&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/Fw8TFDKk92Q&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/Fw8TFDKk92Q&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;Video 1.  Optimization of MA parameters using Python GA&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4Ai1odbnnI/AAAAAAAAAK0/ihO6b8MLSBc/s1600-h/figout1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 275px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4Ai1odbnnI/AAAAAAAAAK0/ihO6b8MLSBc/s400/figout1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440386654652833394" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1.  Final Fitness result output to console&lt;br /&gt;&lt;br /&gt;In fig 1. we see that the final rule converged to {if ma(220) &gt; ma(221)) then Buy.&lt;br /&gt;In addition, the final binary string is shown along with the final fitness value.&lt;br /&gt;We can decode the binary string with relative ease.&lt;br /&gt;[110110111110111100] is the 17 bit string representing the optimal fitness.&lt;br /&gt;ma1 is 1st 8 bits = 11011011 = 219 decimal a +1 offset was used (so as not to have 0 day moving average) to get a resulting parameter argument of 220.&lt;br /&gt;The next bit is = 1 corresponding to &gt;&lt;br /&gt;The final 8 bits represent the 221 argument by similar reasoning as the first.&lt;br /&gt;So the resulting rule with parameters is:&lt;br /&gt;if ma(220) &gt; ma(221) then Buy&lt;br /&gt;fitness = net$gain = $316.12&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S4AmPuEQt1I/AAAAAAAAAK8/RBEKokvMgks/s1600-h/figout2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 302px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S4AmPuEQt1I/AAAAAAAAAK8/RBEKokvMgks/s400/figout2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440390401369356114" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 2. Fitness Convergence&lt;br /&gt;&lt;br /&gt;In fig 2. We see how the fitness continued to climb over successive generations until early convergence caused a halt at the fitness value that did not change over the prior 5 generations.&lt;br /&gt;&lt;br /&gt;In order to verify the results, we will also show how other tools may be used. In this case, I used an older VBA simulator that I wrote a few years back.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/Q6SRTE-m13s&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/Q6SRTE-m13s&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;Video 2.  Summary of optimized parameters using VBA/Excel&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4BzCCd_h9I/AAAAAAAAAME/B6UBIU1vZvM/s1600-h/fig3summaryc.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 289px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4BzCCd_h9I/AAAAAAAAAME/B6UBIU1vZvM/s400/fig3summaryc.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440474828723161042" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Summary of Back Test Results&lt;br /&gt;&lt;br /&gt;Above is a capture of the summary statistics using the back test program.  Total net profit is slightly higher than the python results. This is due to the fact that the python simulation truncated the series length of the moving average data, so as to avoid zero front padded values, while the excel program did not. However, they are still in close agreement. It's often useful to use several different programs to force yourself to double check results.&lt;br /&gt;&lt;br /&gt;Now, as an astute commenter already pointed out... this method is indeed curve fitting. What we found was the best possible pair of parameters(or at least one of the best; there are superior parameters, but I didn't run the example generation set too long) for our particular rule set we set out to investigate.  Or as I mentioned in the first thread, we zeroed in on the region of the distribution curve with the most profitable candidates.  Now, for those of you not familiar with curve fitting, it is not a happy concept amongst developers.  In fact, it suffers from almost the same egregious problems as cherry picking examples, as I mentioned earlier on.&lt;br /&gt;&lt;br /&gt;That being said, however, it is not done in vain.  Our goal here is to quantitatively augment common development (the part where you create and verify) tools beyond mere guessing, intuition, and cherry picking.  Firstly, it is possible that this particular rule set will not fare as well out of sample, which is true. However, in the same sense that we can not just take one cherry picked example for granted, we must also evaluate how things actually do perform out of sample.  I say this because I've used similar techniques that looked very good, and did indeed perform very well out of sample for several periods out into the future.  By honing in on the best candidates, we help to narrow down the set of candidates that are worthy of out of sample investigation.  There are other additional techniques (some mentioned earlier, such as ensemble methods, different objective/fitness functions, and even different optimization criteria) that can be used to enhance this method, and in addition, verify robustness out of sample.&lt;br /&gt;&lt;br /&gt;edit:  Just for giggles, I decided to actually run the Out of Sample performance on this optimized in sample trained rule.  The following chart illustrates how it performed 'out of sample' for the years 2005-today(2010).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4CI2Q3EVvI/AAAAAAAAAMU/R1CDmOsK7Vc/s1600-h/training_test.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 210px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4CI2Q3EVvI/AAAAAAAAAMU/R1CDmOsK7Vc/s400/training_test.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440498815683811058" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 4. Out of Sample Test Performance on optimized training rule parameters.&lt;br /&gt;&lt;br /&gt;Not all that shabby for that curve fitted simple system during the worst meltdown in recent history, eh (much easier on the gut)?&lt;br /&gt;&lt;br /&gt;To be frank, I have run so many evaluations on simple SMA systems, that I would say that they are not the most superior parameters to optimize around. Obviously, however, it really depends on what your objective is. There are some long term studies that have shown using the fitness objective of reduced volatility as the goal is quite beneficial with this simple rule set (you can verify that this simple system had far less volatility over the down periods, than the actual market-- in and out of sample) .  It is up to you to find those parameters that are worthy of optimizing further. See &lt;a href=http://blog.fosstrading.com/2010/02/updated-tactical-asset-allocation.html for a related example.&gt; commentary on A Quantitative Approach to Tactical Asset Allocation &lt;/a&gt; for a related example.&lt;br /&gt;&lt;br /&gt;As always, please do your own due diligence before making any trading decisions.&lt;br /&gt;&lt;br /&gt;And please continue to give your feedback on what you like or don't like and areas you want to explore.&lt;br /&gt;---------------------------------------------------------------------------------&lt;br /&gt;If you are new to Python and would like to order a fantastic textbook, I highly recommend the following (applications geared a bit towards science and engineering): &lt;a href=http://www.amazon.com/Scientific-Programming-Computational-Science-Engineering/dp/3642024742/ref=sr_1_26?ie=UTF8&amp;s=books&amp;qid=1266694073&amp;sr=8-26&gt;A Primer on Scientific Programming with Python &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4A5XOSvquI/AAAAAAAAALM/9TdVF45YVYM/s1600-h/pythonad.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 260px; height: 355px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S4A5XOSvquI/AAAAAAAAALM/9TdVF45YVYM/s400/pythonad.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440411421000051426" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In addition, users who are interested in learning a bit more about VBA with a Financial Oriented slant will find great practical examples in the text: &lt;a href=http://www.amazon.com/Financial-Modeling-3rd-Simon-Benninga/dp/0262026287/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266694665&amp;sr=1-1&gt; Financial Modeling, 3rd Edition &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S4A6pnS28hI/AAAAAAAAALU/r6HWH3MrE9U/s1600-h/finanialmodellingad.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 287px; height: 400px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S4A6pnS28hI/AAAAAAAAALU/r6HWH3MrE9U/s400/finanialmodellingad.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5440412836460687890" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-6078877665892275814?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/6078877665892275814/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading_20.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/6078877665892275814'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/6078877665892275814'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading_20.html' title='Genetic Algorithm Systematic Trading Development -- Part 3  (Python/VBA)'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/S4Ai1odbnnI/AAAAAAAAAK0/ihO6b8MLSBc/s72-c/figout1.jpg' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3375438094422201065</id><published>2010-02-17T15:39:00.000-08:00</published><updated>2010-02-17T23:17:56.768-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genetic Algorithm Systematic Trading Development-- Part 2'/><title type='text'>Genetic Algorithm Systematic Trading Development-- Part 2</title><content type='html'>We started by discussing the goal of a genetic algorithm, which is to optimally find the candidate pool of rules that are superior to other potential rules.  In our example of moving averages, we are seeking the values of parameters of the rule :&lt;br /&gt;if ma(m) binop ma(n) then action.&lt;br /&gt;*Note: binop is short for binary operator; in this case the binary operator is &gt; or &lt;.&lt;br /&gt;&lt;br /&gt;The GA (genetic algorithm) works by encoding the rule set into a string of binary valued variables.  For instance if we wanted to encode the moving average parameter&lt;br /&gt;to 4 real decimal values, we could simply use a 2 bit string, where 00 = 0 decimal, 01= 1 decimal, 10 = 2 decimal and 11 = 3 decimal.  We can encode up to 2^n values per bits contained in each string. If we wanted to encode 512 values, we would need a 9 bit string to encode this value (2^9=512). &lt;br /&gt;&lt;br /&gt;Also, we can encode values other than decimal values as binary bits, for instance,&lt;br /&gt;action = buy or sell, can be represented by 0 or 1.  Greater or Less than (&gt; or &lt;) can be represented by 1 or 0, as well.  In the end we will have a chromosome or total string that represents the rule we are trying to optimize.  So the rule: if {ma(m) binop ma(n) then action} could be encoded by binary values, as each chromosome is represented by 4bits- 2bits- 4- bits- 2 bits, where each element of the rule string corresponds to the encoded values discussed above.  Note that the encoded blocks would be comparable to genes.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S3zTcS-oR-I/AAAAAAAAAKk/BsbHdsx8b6Y/s1600-h/fig_2boolencodinga.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 383px; height: 400px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S3zTcS-oR-I/AAAAAAAAAKk/BsbHdsx8b6Y/s400/fig_2boolencodinga.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5439454933041039330" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Examples of Boolean Encoding&lt;br /&gt;&lt;br /&gt;Once we encode our rule set into a boolen representation or string, we then want to generate a population of strings to select from.  Typically, we start out by assigning random values to the parameters.  For instance, we may start a population of 100 strings; each string representing a set of rules with different parameters.&lt;br /&gt;One string could be if ma(10)&amp;#60ma(50) then buy, another might be if ma(20)&gt;ma(200) then sell.   Once a population has initially been created, we need to create diversity and additionally successfully improve fitness in the offspring over successive generations.&lt;br /&gt;&lt;br /&gt;The concept of fitness is perhaps one of the most elegant and flexible options that makes the GA such a powerful optimizer.  In the decision tree learners and Neural Network learners we discussed, there are only one or two simple goals to train on (decision tree for instance trains towards goal of reducing information entropy, neural net trains on reducing fitted variance errors).  The GA can use any goal you can imagine, which gives it unlimited flexibility compared to others.  You could use&lt;br /&gt;total gain as a goal, or sharpe ratio, or profit factor as goals. You could even combine goals.  The fitness or goal is what you are trying to optimize.  Keep in mind that a genetic algorithm, like any other learner does not guarantee you will find the absolute best, it may get caught in local maxima of the fitness landscape.&lt;br /&gt;However, you can get more sophisticated and add other sub methods to try to avoid this.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3zplXbp2HI/AAAAAAAAAKs/eLOnBUfpa4c/s1600-h/fig3populationa.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 206px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3zplXbp2HI/AAAAAAAAAKs/eLOnBUfpa4c/s400/fig3populationa.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5439479278111152242" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Example of population of rules to be processed.&lt;br /&gt;&lt;br /&gt;Once our initial population of parameter based rules has been created (randomly), we then want to think about how we achieve the goal of optimizing or finding the set of rule parameters that best optimizes our fitness.  Note that each time we create a new population of offspring, we call this a new generation run or epoch.  The first set of offspring or parents, commonly attempts to select some sample of the fittest members to be passed along to the next population.  We could use a greedy method that just sorts or ranks the members and selects only from the best 50% to be passed to the next generation (known as truncation selection) or an alternate method is to use something called roulette selection.  In roulette selection, we sample members of the original population based upon their normalized fitness. So if the best fitness was 20% of the value of the sum of all the finesses, we would copy over that string or rule with a 20% probability into the next generation.  The same would be applied to the other fitness/string combinations. Ultimately, we end up with more of the offspring selected from the most fit candidates in the prior generation.  Next, we want to assure some diversity in the offspring. Crossover operation achieves this by crossing over or swapping genes from one candidate and another.  This is performed over the entire population to ensure diversity.  Lastly, we use mutation to randomly select some number of string elements and flip them.  It adds a bit more random diversity to the offspring, so that possibly some candidate may show up unexpectedly that has great performance (think unusual height in basketball for instance).&lt;br /&gt;More bells and whistles can be added to improve performance. Tournament selection is another method that improves offspring selection by running a tournament between string candidates.  The winning candidate gets passed along to the next generation.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3yGhy201zI/AAAAAAAAAJ0/-iGJR8CDS70/s1600-h/fig4selection.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 236px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3yGhy201zI/AAAAAAAAAJ0/-iGJR8CDS70/s400/fig4selection.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5439370365102249778" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Selection process.&lt;br /&gt;&lt;br /&gt;We essentially run the optimization and diversity routines through each generation, and the best candidates get passed down through to the next generation until our number of generations has run out, or we specify an early stopping criterion.&lt;br /&gt;&lt;br /&gt;In the case of our rule set, we expect it to converge to the best set of parameters&lt;br /&gt;(moving average arguments, and binary greater or equal than operator), based upon the fitness goal we assign to it.&lt;br /&gt;&lt;br /&gt;Next.  Genetic Algorithm Systematic Trading Development-- Part 3&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3375438094422201065?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3375438094422201065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading_17.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3375438094422201065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3375438094422201065'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading_17.html' title='Genetic Algorithm Systematic Trading Development-- Part 2'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/S3zTcS-oR-I/AAAAAAAAAKk/BsbHdsx8b6Y/s72-c/fig_2boolencodinga.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-599592813760537119</id><published>2010-02-15T15:45:00.000-08:00</published><updated>2010-02-16T01:49:02.918-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Genetic Algorithm Systematic Trading Development -- Part 1'/><title type='text'>Genetic Algorithm Systematic Trading Development -- Part 1</title><content type='html'>I want to start with a brief introduction to what I consider one of the most powerful learning methodologies to come out of Artificial Intelligence in the last several decades-- the Genetic Algorithm.  Although it was originally developed to model evolutionary biology in the late 50s, most give credit to John Holland for his detailed contributions to the development of the field.  A professor of adaptive systems at the University of Michigan, he wrote a text, titled "Adaptation in Natural and Artificial Systems" in 1975, that is considered a landmark book in the field.&lt;br /&gt;&lt;br /&gt;Although GAs are designed to borrow from our Genetic Biology, I want to quickly describe why they are powerful with respect to trading systems development. As a trader, you might often develop systems using creative ideas borrowed from Technical Analysis books you may have read.  One of the problems with earlier TA books in general, IMO, is that they often have "cherry picked" examples of parameter sets, without much explanation as to how they arrived at the parameters, nor how well they might perform against other parameters. In statistics, we are often interested in gathering many different samples to build a distribution profile of the outcomes as an estimate of the true population of all possible candidate solutions. We often look at these distributions of models to gather a quantitative deduction about whether our particular system (with the parameters we selected) has performed better than any other potential system in the universe of all possible candidate solutions.&lt;br /&gt;&lt;br /&gt;If the system performed better than some designated percentage of the sample distribution area of 100% (often set at 1% or 5% in common literature), then we can say that the result compared to the universe of candidates is "statistically significant".  Using different parameters for the same set of systematic rules will give different sample outcomes that make up that distribution.  For instance, using moving average crossovers, we might end up selecting one pair of moving average values to determine entry and exit with a resulting profit of .1%, while another combination over the same period yielded 2.3%.  Ultimately we want to find the set of pairs that performs the best, or at least significantly better than Buy and Hold, otherwise there's typically not much incentive to trade in and out as commission costs and other negative effects make it prohibitive.  We could try to run various parameters by guessing or enumerating over the search space of potential solutions, but at a certain point, the number of combinations becomes unwieldy and is not computationally efficient.  The first step might be to evaluate the parameters of our system and look for those parameters that yield statistically significant results, the next might be to compare that candidate to buy and hold or other potential system candidates using a t-test of the separate distributions.&lt;br /&gt;&lt;br /&gt;Let's take an example of a potential set of rules to illustrate this idea.  Suppose we sat down one day and decided upon a rule that said to buy if the m period moving average was greater or less than the n period moving average. First, we need to decide upon what range of values to use for the averages. If we discretize  the range of values to integer values, i.e. 1 to 512 steps each, we would have 512X512x2 (where 2 represents greater or less than)= 542,288 different parameters to enumerate through (or try).  Although that doesn't seem too large of a number of combinations to try with today's computational power, as we begin to make the rules more complex, the number of combinations will begin to run into the millions.  It's just not feasible to try all of them, so we want to find some method to reduce the number of potential candidates, while at the same time finding the best possible results.  What we are trying to do is find an 'optimal' algorithm to converge to the best solutions quickly.  There are numerous algorithms employed in the field of machine learning, under the category of optimization algorithms that exist to achieve this goal.  The genetic algorithm is one such optimization algorithm that borrows directly from our own evolutionary genetic system to find the best potential candidate, without having to literally try out every single possible combination.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3nw7bhFdLI/AAAAAAAAAHE/qhSfqOPaVuI/s1600-h/fig1_statistics_graph.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 226px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3nw7bhFdLI/AAAAAAAAAHE/qhSfqOPaVuI/s400/fig1_statistics_graph.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5438642928816059570" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fig1. Example of searching for statistically superior parameters.&lt;br /&gt;&lt;br /&gt;Above, we see an example distribution of possible candidate solutions in the population of potential parameter pairs with the x-axis representing binned ranges of potential gain for the system, and y representing the frequency of parameter pair outcomes corresponding to that gain.  Our Genetic Algorithm will help us to find those solutions that are statistically significant compared to potential solutions.&lt;br /&gt;&lt;br /&gt;Next: Genetic Algorithm Systematic Trading Development -- Part 2&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-599592813760537119?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/599592813760537119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/599592813760537119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/599592813760537119'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/genetic-algorithm-systematic-trading.html' title='Genetic Algorithm Systematic Trading Development -- Part 1'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3nw7bhFdLI/AAAAAAAAAHE/qhSfqOPaVuI/s72-c/fig1_statistics_graph.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-1650523527424352170</id><published>2010-02-11T09:41:00.000-08:00</published><updated>2010-02-14T15:46:37.316-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Artificial Immune Systems and Financial Applications?'/><title type='text'>Artificial Immune Systems and Financial Applications?</title><content type='html'>One of the buzzwords that seems to be common these days is AIS or Artificial Immune Systems.  It is a biologically inspired classification type system that essentially tries to replicate some of our own natural immune system algorithms.  Our bodies have various defense mechanisms for recognizing foreign invaders.  One such defense mechanism is utilizing pathogen-associated molecular patterns built into our genome so that we can identify foreign pathogens and respond to them.  This concept of being able to distinguish between self and non-self is essentially one of the main themes that is borrowed from our natural biological defense.&lt;br /&gt;&lt;br /&gt;The idea is that we can learn to recognize objects that are not normal and respond to them.  A good example where this has been put to use is in SPAM detection.  We want to be able to recognize good (known) mail from the (unknown) bad, and utilize AIS to avoid parasitic SPAM.  &lt;br /&gt;&lt;br /&gt;The most common AIS system that has been adopted is the Negative Selection Algorithm. It essentially works by training a classifier to generate negative examples randomly and if the negative example (imagine a coordinate on a 2D grid) happens to be located (using some distance metric, such as euclidean or mahanalobis distance) near enough to a known good example, then it is rejected.  Essentially, we are randomly generating negative examples for the classifier to identify.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3RLqQ5i7RI/AAAAAAAAAGs/Gi4TopgF9og/s1600-h/AIS_example.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 320px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3RLqQ5i7RI/AAAAAAAAAGs/Gi4TopgF9og/s400/AIS_example.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5437053839605951762" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Basic concept of AIS type classification&lt;br /&gt;&lt;br /&gt;Although it has been used in some cases for applications like credit detection, one of the things you'll notice in AI based algorithms is that there are many other algorithms that might do the same job or better.  In the case of trading systems, we might want to identify abnormal time series behavior and make a decision based upon this information, however, it is might be simpler to use statistical control methods to better ascertain the information.&lt;br /&gt;&lt;br /&gt;In conclusion, although the idea sounds promising, I haven't seen many practical superior examples that would benefit a trading system that cannot be employed by existing AI algorithms.&lt;br /&gt;&lt;br /&gt;&lt;a href = "http://www.artificial-immune-systems.org/"&gt; artificial immmune systems &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There is a wealth of good reading in the linked site, as well as a Weka plugin that can be used to access this algorithm for readers following the Weka tutorials.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-1650523527424352170?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/1650523527424352170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/artificial-immune-systems-and-financial.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1650523527424352170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/1650523527424352170'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/artificial-immune-systems-and-financial.html' title='Artificial Immune Systems and Financial Applications?'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3RLqQ5i7RI/AAAAAAAAAGs/Gi4TopgF9og/s72-c/AIS_example.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8204799425553374551</id><published>2010-02-10T23:21:00.001-08:00</published><updated>2010-02-14T15:46:59.410-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Using J48 Decision Tree Classifier to Dynamically'/><title type='text'>Using J48 Decision Tree Classifier to Dynamically Allocate Next Day Position in Stocks or Bonds</title><content type='html'>The prior introduction using a simple model to determine next weeks change based on the S&amp;P 500 index and VIX did not look very promising, although hopefully it served to familiarize yourself with how classification is used in augmenting trading decisions.  Wouldn't it be nice if we had something that performed a little better?&lt;br /&gt;&lt;br /&gt;Well, let's look at an application of using a decision tree type classification in order to predict whether to invest in stocks or bonds one day ahead of time.&lt;br /&gt;We will use a very simple input model stimulus in order to arrive at a decision.&lt;br /&gt;The following will be used as input attributes.&lt;br /&gt;1) VIX 1 day change&lt;br /&gt;2) TLT 1 day change&lt;br /&gt;3) SPY 1 day change&lt;br /&gt;4) VIX 5 day momentum&lt;br /&gt;5) TLT 5 day momentum&lt;br /&gt;6) SPY 5 day momentum&lt;br /&gt;&lt;br /&gt;The VIX is used as a volatility proxy to measure fear, which leads (presumably) to flights to safer instruments (bonds).&lt;br /&gt;&lt;br /&gt;The TLT is the iShares Barclays 20+ Year Treas Bond ETF used to track treasury bonds with an average duration of 20 years. &lt;br /&gt;&lt;br /&gt;The SPY is an ETF that tracks the general market index: S&amp;P500. &lt;br /&gt;&lt;br /&gt;The remaining 5 day momentum attributes are simply nominal attributes of UP or DN used to generally ascertain the momentum of the index over the last 5 days. In addition to the input attributes, we append one output attribute which is the superior instrument to invest in the following day-- SPY or TLT (stocks or bonds).  This is what we are trying to predict and decide upon.  The training and testing data sample is from the period 7/31/2002 up until present.&lt;br /&gt;&lt;br /&gt;By entering the information into Weka (via .csv, see prior tutorials), we will choose the J.48 decision tree learner and use 90%/10% training/test split in order to develop a model tree that will predict which class of instrument to invest in based upon the prior days input stimulus.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3OwGGm8zRI/AAAAAAAAAGU/c1qPX_Dntrg/s1600-h/fig1_tree2_raw2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 239px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3OwGGm8zRI/AAAAAAAAAGU/c1qPX_Dntrg/s400/fig1_tree2_raw2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5436882794066005266" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Resulting Model Decision Tree&lt;br /&gt;&lt;br /&gt;The decision tree can be read from the top down as making a decision based upon certain conditions.  I.e. If we traverse the far left branch for example, it would give us the following rule:&lt;br /&gt;IF 5 day SPY momentum is DN and 1 day TLT change is &lt;=.91% and 5 day TLT momentum is UP  and 5 day VIX momentum is UP then&lt;br /&gt;invest the next day in SPY.&lt;br /&gt;&lt;br /&gt;We can traverse each branch similarly to obtain an all encompassing set of rules to make a decision on what to invest in the following day.&lt;br /&gt;Although the tree looks a bit daunting, if you can program the rule set into your favorite language, it is a simple matter for the algorithm to take that model and process it forward.&lt;br /&gt;&lt;br /&gt;Finally, we want to see if the prediction scheme was any better or worse than guessing.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3OweN5Lu4I/AAAAAAAAAGc/CGROTCjNPiA/s1600-h/fig2_results_raw2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 317px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3OweN5Lu4I/AAAAAAAAAGc/CGROTCjNPiA/s400/fig2_results_raw2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5436883208338389890" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. 90/10 split train/validation results of J.48 Model Tree&lt;br /&gt;&lt;br /&gt;The results are pretty good.  Using a very simple and intuitive model, we were able to select the better instrument to buy with a 59% success rate on the 10% out of sample validation set.  The same type of methodology can be used to select between trading systems with a little ingenuity.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3Oyme7iNbI/AAAAAAAAAGk/s08gw1gd2aQ/s1600-h/fig3_eq_curve.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 315px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3Oyme7iNbI/AAAAAAAAAGk/s08gw1gd2aQ/s400/fig3_eq_curve.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5436885549373863346" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Equity Curve comparison of Learner System to investment classes on out of sample data&lt;br /&gt;&lt;br /&gt;Finally, we take a look at the equity curve of investing in &lt;br /&gt;1) The results of the classifier system we modeled&lt;br /&gt;2) Investing in SPY or TLT alone (Stocks or Bonds) &lt;br /&gt;3) Investing half in each&lt;br /&gt;&lt;br /&gt;Notice the terminal wealth results from our system only slightly beat&lt;br /&gt;all of the other systems.  It's a good example of how you might have a good hit rate and only moderate improvement in net results, since hit rate does not account for magnitude.  In addition, the costs associated with commission and slippage from trading many times in an out would likely overcome the systematic edge. Later on as we discuss Genetic Algorithms, we will see there are many other ways to optimize. &lt;br /&gt;&lt;br /&gt;As always, please do your own due diligence and thoroughly verify any results you may use to make decisions in your own trading.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8204799425553374551?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8204799425553374551/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/prior-introduction-using-simple-model.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8204799425553374551'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8204799425553374551'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/prior-introduction-using-simple-model.html' title='Using J48 Decision Tree Classifier to Dynamically Allocate Next Day Position in Stocks or Bonds'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3OwGGm8zRI/AAAAAAAAAGU/c1qPX_Dntrg/s72-c/fig1_tree2_raw2.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3995826417168847495</id><published>2010-02-08T10:05:00.000-08:00</published><updated>2010-02-14T15:47:12.248-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Classification for stock directional prediction'/><title type='text'>Classification for stock directional prediction</title><content type='html'>The neural network tutorial focused on a type of method known as regression.  The other common method utilized in machine learning is called classification.  The two approaches are somewhat similar in that they identify the best possible curve to learn from a set of data. The difference lies in how they use the curve to learn from the data. In the case of regression, we are often minimizing the distance between each exemplar and the average, whereas in classification, we are trying to discriminate between separate classes by region.&lt;br /&gt;&lt;br /&gt;Although the following example is if anything an example of market efficiency (i.e. not much edge in terms of prediction), it serves to illustrate the basic idea of classification with application towards market prediction.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3BaHBhL0hI/AAAAAAAAAFs/LLgSzCMombE/s1600-h/classa1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 295px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3BaHBhL0hI/AAAAAAAAAFs/LLgSzCMombE/s400/classa1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5435943826949394962" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. S&amp;P 500 weekly change vs. VIX&lt;br /&gt;&lt;br /&gt;In the figure above, we see a common scatterplot depiction of the S&amp;P 500 weekly return vs the VIX (which is a proxy for volatility).  One common observation utilizing regression shows that the S&amp;P 500 is negatively correlated to the VIX. Or qualitatively, large positive changes in the VIX imply negative changes in the S&amp;P index; this is one reason it's often known as the fear index (since a rise in the VIX is associated with negative returns in the S&amp;P 500). If we were to run a regression, we would quantitatively describe this correlation by the R^2 value of the slope, which as can be seen here visually, is a negative value.&lt;br /&gt;&lt;br /&gt;But the regression observation says nothing about prediction into the future, it only says that there is a negative relationship between the two values at any given sample instant (in this case, weekly samples). One way to set up the prediction problem for illustration would be to use the current changes in both the S&amp;P 500 index as well as the VIX to predict the next weeks change using a classification method.&lt;br /&gt;&lt;br /&gt;The plot shows both UP and DN changes one week later, depicted by green and red labels. Notice that the outcome of the prediction here is nominal and not numerical, which is another common distinguishing feature between classification and regression schemes.  Common methods used to deploy classification schemes are learning trees, support vector machines, and most of the tools that are also used in regression.  Ideally if the classification scheme was able to discriminate classes well, it would separate classes by a curve or some type of function that would isolate both in sample as well as out of sample classes with a good separation.&lt;br /&gt;&lt;br /&gt;Unfortunately, when data is very random, it is not able to separate classes very well.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3BXB7ujV2I/AAAAAAAAAFc/gCusXDrsCe8/s1600-h/classplot1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 277px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S3BXB7ujV2I/AAAAAAAAAFc/gCusXDrsCe8/s400/classplot1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5435940440960620386" /&gt;&lt;/a&gt;&lt;br /&gt;Fig 2. Plot of classification values for S&amp;P 500 UP and DN against VIX&lt;br /&gt;&lt;br /&gt;We can see that there is so much overlap in the UP and DN regions that it would be hard to find a curve that would classify the regions with good separation.&lt;br /&gt;&lt;br /&gt;We use a common learning decision tree scheme called J.48 to attempt to predict out of sample results for the classifier.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3BXlKjDQcI/AAAAAAAAAFk/_eRSTRUtpDQ/s1600-h/wekaclassout.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 329px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3BXlKjDQcI/AAAAAAAAAFk/_eRSTRUtpDQ/s400/wekaclassout.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5435941046234333634" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Out of sample classification results&lt;br /&gt;&lt;br /&gt;Using the 66% In sample (training) scheme as in the NN example, we see that the predictive learner had a 53% success rate out of sample.  If we compare this result to a simple naive learner (often used as a benchmark) using the last result as the prediction, we get identical results.  The upshot is that using the information we have, the markets have proved efficient against this simple prediction method.&lt;br /&gt;&lt;br /&gt;The classification concept may be extended to other applications (such as regime detection, system selection, artificial immune systems, or using multivariate input attributes) with some creativity, but the goal here was to give a simple introduction to the concept as it is the one of the most important learning concepts in machine learning.  Classification may also employ supervised or unsupervised methods-- in this case it was using supervised learning (training by examples).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3995826417168847495?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3995826417168847495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/classification-for-prediction.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3995826417168847495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3995826417168847495'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/classification-for-prediction.html' title='Classification for stock directional prediction'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_7YSZm5NIAmQ/S3BaHBhL0hI/AAAAAAAAAFs/LLgSzCMombE/s72-c/classa1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-634713089202016730</id><published>2010-02-07T20:13:00.000-08:00</published><updated>2010-02-14T15:52:37.673-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Practical Implementation of Neural Network based time series (stock) prediction  -PART 5'/><title type='text'>Practical Implementation of Neural Network based time series (stock) prediction  -PART 5</title><content type='html'>Following is an example of what it looks like to predict an actual univariate price series.  The period of the signal that was sampled was already in stationary form, so not much massaging was needed other than normalization (described earlier).&lt;br /&gt;&lt;br /&gt;What's important to notice when you see these kinds of neural network predictions (particularly in marketing snapshots for software vendors or trading book examples) is that they look fantastic out of sample from a bird's eye view. Unfortunately, the devil is always in the details. If you zoom way in, the predictions are not as accurate as the larger picture portrays.  A more accurate method to asses how well the prediction performed is to look at the percentage change of each predicted value.  We can simply compare the sign of the actual percentage change to the predicted change.  In this case, the  out of sample test results had a 43% hit rate, which is worse than a naive predictor would predict.  The good news is you can flip those results, and just predict the opposite direction to get a 57% hit rate.  However, you always have to be careful to do due diligence to verify the robustness of these types of predictions over many conditions.  Another thing to be careful about is that hit rate only gives you number of correct predictions, but tells you nothing about the magnitude of the predictions, which are important to have a positive net expectation.  The type of result you see here, however, is common for predicting specific univariate time series data values.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2-RlkI9fBI/AAAAAAAAAFM/FGT5v5caFaA/s1600-h/stock2resutls.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 202px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2-RlkI9fBI/AAAAAAAAAFM/FGT5v5caFaA/s400/stock2resutls.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5435723349801925650" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Stock Prediction with out of sample region highlighted&lt;br /&gt;&lt;br /&gt;You now have a practical example to get you started with building your own prediction system with free tools (except excel, which you likely have), and some ideas and methods to build your own prediction system.  Any professional software you purchase will not differ much other than using different attributes to train on or modifying the internal architecture of the neural network.  I have not shown more detailed examples on advanced techniques, but might incorporate some later if there is demand.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-634713089202016730?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/634713089202016730/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neual.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/634713089202016730'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/634713089202016730'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neual.html' title='Practical Implementation of Neural Network based time series (stock) prediction  -PART 5'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2-RlkI9fBI/AAAAAAAAAFM/FGT5v5caFaA/s72-c/stock2resutls.jpg' height='72' width='72'/><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-237779198559676937</id><published>2010-02-04T13:23:00.000-08:00</published><updated>2010-02-14T15:52:59.739-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Practical Implementation of Neural Network based t'/><title type='text'>Practical Implementation of Neural Network based time series (stock) prediction  -PART 4</title><content type='html'>Consider this an introduction to how we need to pre-process the data.&lt;br /&gt;I mentioned earlier that a financial time series is typically a unit root or non-stationary signal, what this means is that if you sample statistical properties over time, they will obviously change.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2s8cxU6fDI/AAAAAAAAAEc/T_QarVM5eUI/s1600-h/fig1sp500.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 193px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2s8cxU6fDI/AAAAAAAAAEc/T_QarVM5eUI/s400/fig1sp500.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5434503840327695410" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. S&amp;P 500 non-stationary signal&lt;br /&gt;&lt;br /&gt;You can see that as we sample the average at various points it is constantly changing.  Another property of a unit root time series is that it is continuously growing (or exploding).  We need to somehow transform the time series back into a stationary signal, so that the Neural Net can process and learn it. Not only is it necessary for the Neural Net to see similar if not repeating data over and over, but any values beyond the internal squashing function will get saturated at the rails of the processing elements.&lt;br /&gt;&lt;br /&gt;One of the things that you'll notice for many long term financial time series is that they grow exponentially, so a good candidate fit might be an exponential equation.  However, since we will be using decomposition detrending, I prefer to use a line fit.  In order to accomplish this, we can take the log of the data and later reverse the operation for post processing. Taking the log of exponential data also transforms the exponential regression to a linear one that we can use linear regression on and subtract the time series to get some stationarity .  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2tCfVHXV0I/AAAAAAAAAEs/ryGymlyw7nE/s1600-h/fig3logtransform.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 195px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2tCfVHXV0I/AAAAAAAAAEs/ryGymlyw7nE/s400/fig3logtransform.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5434510481364047682" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Log Transformed Time Series&lt;br /&gt;&lt;br /&gt;Also, notice that we will be predicting the next day, so we can simply use linear regression parameters updated daily to predict the next day. &lt;br /&gt;&lt;br /&gt;If we have a sufficient amount of data, we should see that the parameters settle to a stable limit, much as a coin toss converges to an asymptotic limit.  If the parameters settle, we have some confidence that they will not change much from one sample prediction to the next.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2s-dsqbqlI/AAAAAAAAAEk/xEGsJZy6eXA/s1600-h/fig2slopeconvergence.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 287px; height: 400px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2s-dsqbqlI/AAAAAAAAAEk/xEGsJZy6eXA/s400/fig2slopeconvergence.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5434506055278897746" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 3. Dynamic Slope Settling of Linear Prediction Parameters&lt;br /&gt;&lt;br /&gt;Notice that the parameters have settled to a pretty stable value over the training period, implying that we don't expect them to change too wildly from the true value on the next predicted estimate.&lt;br /&gt;&lt;br /&gt;After we subtract the line regression from the log transformed signal we get our detrended and stationary signal.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2tIKVpM-RI/AAAAAAAAAE0/aDEVf6A5WgY/s1600-h/fig4detrenedseries.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 190px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2tIKVpM-RI/AAAAAAAAAE0/aDEVf6A5WgY/s400/fig4detrenedseries.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5434516717798488338" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 4. De-trended Log transformed signal&lt;br /&gt;&lt;br /&gt;Notice it appears much more stationary than the original time series. However, because the Neural Network does not get to see a lot of repetitive high frequency information over the time window, I will detrend once more with a faster smoothed representation.  First we will use a 100 period moving average as the new intermediate trend, then subtract a 25 period moving average to get the 2nd detrended series.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2tJkdQ43aI/AAAAAAAAAE8/ptPZ3qsr3v0/s1600-h/fig5_2nddetrendedseries.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 128px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2tJkdQ43aI/AAAAAAAAAE8/ptPZ3qsr3v0/s400/fig5_2nddetrendedseries.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5434518266032217506" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 5. Second de-trended series.&lt;br /&gt;&lt;br /&gt;Notice that even this small sample shows a much better signal for the Neural Network to learn subtle patterns in the time series, and that stationarity property is very tame.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S26aBXEAbjI/AAAAAAAAAFE/FoaCYJn25HA/s1600-h/fig6stockpred.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 167px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S26aBXEAbjI/AAAAAAAAAFE/FoaCYJn25HA/s400/fig6stockpred.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5435451148444134962" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;FIg 6. Reconstructed prediction Out of Sample&lt;br /&gt;&lt;br /&gt;The figure above shows an example of a stock series that has been decomposed and smoothed then recomposed with a 100 and 25 period moving average and the out of sample period.  There is a very good correlation between predicted and actual smoothed estimates.  Such a system might be utilized in a moving average crossover prediction to gain a 1 day advantage in estimating momentum.  There are some very small discrepancies in predicted vs actual values, however, I believe it is due to one small problem I've had with Weka.  The output of Weka only outputs 3 digits numerical precision. On the nabble forum they have mentioned a newer option in Subversion, but I haven't had a chance to play with it yet.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-237779198559676937?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/237779198559676937/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neural_04.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/237779198559676937'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/237779198559676937'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neural_04.html' title='Practical Implementation of Neural Network based time series (stock) prediction  -PART 4'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2s8cxU6fDI/AAAAAAAAAEc/T_QarVM5eUI/s72-c/fig1sp500.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-8801750100777513886</id><published>2010-02-01T18:03:00.000-08:00</published><updated>2010-02-14T15:53:34.648-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Practical Implementation of Neural Network based Time Series (Stock) Prediction - PART 3'/><title type='text'>Practical Implementation of Neural Network based Time Series (Stock) Prediction - PART 3</title><content type='html'>Ok, now that we have seen how well the perfect sine wave signal was learned, let's turn it up a notch and see how well the complex sine wave was learned.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2eIR9A0y9I/AAAAAAAAADE/Hvuja3GrWZE/s1600-h/fig1_pt3_complex_rsltplot.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 172px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2eIR9A0y9I/AAAAAAAAADE/Hvuja3GrWZE/s400/fig1_pt3_complex_rsltplot.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433461317462969298" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Summary of Actual Vs. Predicted out of sample complex sine waveform&lt;br /&gt;&lt;br /&gt;Uh Oh. What happened, the out of sample data does not look quite as good.  But, let's take a look at the summary statistics.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eI6LmwJVI/AAAAAAAAADM/hNCBUUf_bUs/s1600-h/figg_2_weka_rslts.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 312px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eI6LmwJVI/AAAAAAAAADM/hNCBUUf_bUs/s400/figg_2_weka_rslts.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433462008574911826" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Weka Summary for Actual vs Predicted OOS complex sine waveform&lt;br /&gt;&lt;br /&gt;We see that the rmse went way up from 0 to about .92, even though the correlation coefficient is still pretty good looking.  What's happening is that even though the signal is still perfectly deterministic, the NN needs more training data or more work on the architecture to approximate the new function properly.&lt;br /&gt;&lt;br /&gt;Lastly, let's add some random noise to the signal.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2eVHPlmt2I/AAAAAAAAAD0/RIL1a4ILuaU/s1600-h/fig3_noiseadded.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 166px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2eVHPlmt2I/AAAAAAAAAD0/RIL1a4ILuaU/s400/fig3_noiseadded.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433475427121674082" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;fig 3. complex sin with noise added.&lt;br /&gt;&lt;br /&gt;And let's try to train on the random signal.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eUWHm04XI/AAAAAAAAADs/iKQSQtsTzdE/s1600-h/fig4_act_vs_pred_rslts_nois.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 175px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eUWHm04XI/AAAAAAAAADs/iKQSQtsTzdE/s400/fig4_act_vs_pred_rslts_nois.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433474583165722994" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 4. Actual vs prediction complex sin with noise added&lt;br /&gt;&lt;br /&gt;We see that the predictions are starting to look downright bad.&lt;br /&gt;The rmse went to .3, but it can be a bit misleading as the signal magnitude of the predicted waveform has dropped considerably.  More importantly the correlation coefficient dropped from .9 down to .3.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eN86jZ8CI/AAAAAAAAADk/qyuJxGomyxU/s1600-h/fig5_rstls.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 312px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eN86jZ8CI/AAAAAAAAADk/qyuJxGomyxU/s400/fig5_rstls.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433467553095217186" /&gt;&lt;/a&gt;&lt;br /&gt; &lt;br /&gt;fig 5. Weka summary of Results.&lt;br /&gt;&lt;br /&gt;Although the rmse doesn't look too bad, the correlation coefficient dropped from .9 all the way to .3 and relative error jumped from 15% to 97%.&lt;br /&gt;&lt;br /&gt;Conclusion, the more noisy or high frequency the signal we train, the worse the results.  Let's try to understand this from a different perspective.&lt;br /&gt;&lt;br /&gt;Let's think about why the first simple complex predictions were so nice.&lt;br /&gt;What does a neural network really do?  You might have heard that it is a universal function approximator.  This is essentially true.  Just as a line fit, y=mx+b is a universal linear function estimate, a neural network thrives on learning any non-linear unknown general function.  But, let's have a look at the scatterplot of only the original sine vs it's previous lagged value.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2ed3RarTWI/AAAAAAAAAD8/k0wWh8jRNPI/s1600-h/fig6_scatter1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 386px; height: 400px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2ed3RarTWI/AAAAAAAAAD8/k0wWh8jRNPI/s400/fig6_scatter1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433485048339451234" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 6. Scatterplot of perfect sine vs. lagged one value and time series plot.&lt;br /&gt;&lt;br /&gt;What we notice is that when we lag the sine against itself, we see a nice deterministic pattern as we expect. This pattern is also sometimes called a lissajou pattern.  But, what happens when we try to predict a value from only the previous lagged value?  There are two possible outputs, pt A and pt B.  If you recall way back in algebra, a function is a mapping of a set of point(s) in a range to one and only one unique output, but here we see there are two.  Therefore, even if the model was perfect, it could never properly predict the next value as there are two possible outcomes; it's about as good as a coin toss. So the actual predict result would be the average of the two possible output states.  But, remember we added lagged values as inputs to be trained on.  Well, what happens when we do a scatterplot of the perfect sine against the prior two lagged values?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2egyM2v-4I/AAAAAAAAAEE/DGsxZ0kFG2I/s1600-h/fig7_3dscatter.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 386px; height: 400px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2egyM2v-4I/AAAAAAAAAEE/DGsxZ0kFG2I/s400/fig7_3dscatter.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433488259750558594" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 7. 3D Scatterplot of perfect sine wave against two lagged values and ts plot.&lt;br /&gt;&lt;br /&gt;What we see is that by conditioning the function on the prior two lagged value pair, there is only one and only one unique corresponding output point! There is no more ambiguity, therefore there exists a perfect function that can fit this conditional prediction.  This is why the first perfect sine with embedded variables had such a perfect fit on the neural network regression.  It is another way to think about how a neural network learns patterns and why using embedded dimensions or lagged variables to train on is useful.&lt;br /&gt;&lt;br /&gt;What happens though when we corrupt the sin with noise?&lt;br /&gt;Here is the scatterplot.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eis6ABWkI/AAAAAAAAAEM/HPfUInLLmTs/s1600-h/fig8_noisiysinscatter.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 396px; height: 400px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2eis6ABWkI/AAAAAAAAAEM/HPfUInLLmTs/s400/fig8_noisiysinscatter.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433490367813081666" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig8.  Noisy Sine Scatterplot against lagged values.&lt;br /&gt;&lt;br /&gt;Look at all the possible ambiguous outputs each prior input predicts!  It's no wonder the poor neural network has a hard time learning.  It will either give some average output, or depending on the embedded dimension structure (lagged values), a very different prediction than we would expect.&lt;br /&gt;&lt;br /&gt;In conclusion, I hope I've given you some food for thought about what a neural net likes and how it learns well. &lt;br /&gt;&lt;br /&gt;It may need more than one lagged dimension to learn well and it does NOT like noisy inputs!  This is a problem I have found with a lot of the literature that uses neural networks to predict and gives it a bad rap.  They summarize using metrics like hit rate as an objective function.  Yet, this is like trying to track a coin toss, it's just not always the most useful objective.   &lt;br /&gt;&lt;br /&gt;I want you to also think about it another way, as it might apply to stock prediction.  Look at the following signal.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2ekTM8_EPI/AAAAAAAAAEU/vYqprou2qks/s1600-h/fig9.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 166px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2ekTM8_EPI/AAAAAAAAAEU/vYqprou2qks/s400/fig9.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5433492125247279346" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;fig 9. Momentum tracking with smooth sine&lt;br /&gt;&lt;br /&gt;Take a look at what we are doing by tracking the 'smoothed' version of the sine.&lt;br /&gt;We are simply tracking the momentum-- up or down (and possibly sideways)-- that's it.  Or another way to think about it is we are tracking the trend, but not each little wiggle.  We can also see that there is strong serial or auto-correlation in the momentum, unlike a high frequency raw time series.&lt;br /&gt;&lt;br /&gt;By using a 'smoothed' version of a signal, we can focus on tracking the signal and not the noise.  So things like hit rate are not that important. What's important is that we captured most of the meat of the trend.  A secondary benefit is that we do not get bucked around and churned like a bronco as much.  In communications, we use something called a phase locked loop to track clock signals embedded in time domain noise (jitter), here we are focusing on tracking the financial 'signal' embedded in the noise and not so much on each little fluctuation.  It is true there will be residual fluctuations, but these drawdowns can be monitored through something like a statistical control chart, while allowing the neural net to focus on and track the signal while not getting bogged down in trying to track noise, which can be counterproductive.&lt;br /&gt;&lt;br /&gt;Another way to think about this issue is as follows. If you are familiar with econometrics, there are no shortage of models that try to predict all of the sharp turns and high frequency components (AR, ARMA family, etc.). Normally they will tell you that if the residuals still have some serial correlation, that you have not modeled it well and it needs additional fine tuning. That is all great if you are trying to perfectly back fit a model (deductively), but it works pretty bad out of sample (inductively), because you are essentially over-fitting the model.  One of the very interesting successful concepts that has come out of machine learning in recent years, is the idea of ensemble averaging methods.  There are several tools like bagging, boosting, stacking, and committee voting that try to take an average prediction rather than a precise one.  Predicting the averages has found much success, including the well known NETFLIX prize, where they stack learners.&lt;br /&gt;&lt;br /&gt;If this is starting to sound foreign to you, just think about the point of this post, which is to try to smooth the signal and follow the average, rather than predict the high frequency fluctuations.&lt;br /&gt;&lt;br /&gt;NEXT.  Part 4. The Stock Prediction example.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-8801750100777513886?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/8801750100777513886/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neural.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8801750100777513886'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/8801750100777513886'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/02/practical-implementation-of-neural.html' title='Practical Implementation of Neural Network based Time Series (Stock) Prediction - PART 3'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2eIR9A0y9I/AAAAAAAAADE/Hvuja3GrWZE/s72-c/fig1_pt3_complex_rsltplot.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-2855871561497713078</id><published>2010-01-30T17:50:00.000-08:00</published><updated>2010-05-25T16:45:41.182-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Practical Implementation of Neural Network based time series (stock) prediction - PART 2'/><title type='text'>Practical Implementation of Neural Network based time series (stock) prediction - PART 2</title><content type='html'>As a brief follow up to the series, I want to take a moment to describe a bit about Weka, which is the machine learning tool that we will be using to implement the neural network.  It is a fantastic open source JAVA based tool that was developed at the University of Waikato, New Zealand.  Users who are not all that experienced with programming have access to the GUI shell that makes running a regression or classification scenario a snap.  More advanced JAVA programmers may opt to use a command shell or customize their own classes.  In addition there are numerous support options, including a fantastic Nabble thread that you may subscribe to--&lt;br /&gt;&lt;a href="http://old.nabble.com/WEKA-f435.html"&gt; Weka thread &lt;/a&gt;  I have found that questions are answered very promptly and there is a lot of activity at the site, so you don't have to wait a long time to get a response.  In addition there are some great books put out by Ian Witten and Eibe Frank that guide you through the practical data mining with a minimal barrage of mathematical theory:&lt;br /&gt;&lt;a href="http://www.amazon.com/Data-Mining-Practical-Techniques-Management/dp/0120884070/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1264903087&amp;sr=8-1http://www.amazon.com/Data-Mining-Practical-Techniques-Management/dp/0120884070/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1264903087&amp;sr=8-1"&gt; Data Mining Practical Machine Learning Tools and Techniques With Java Implementations&lt;/a&gt;  I have the first edition and have found it an immensely useful reference.&lt;br /&gt;&lt;br /&gt;There are a variety of built in learning modules included in the free utility (Weka), such as linear regression, neural networks (a.k.a multilayer perceptrons), decision trees, support vector machines, and even genetic algorithms.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2TlgGsWkPI/AAAAAAAAAA0/-U1DLOHeWuU/s1600-h/weka_gui1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 241px; height: 400px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2TlgGsWkPI/AAAAAAAAAA0/-U1DLOHeWuU/s400/weka_gui1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432719390230876402" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Using the Weka Gui &lt;br /&gt;&lt;br /&gt;In Fig 1., we see the Weka GUI Chooser has been opened and the Explorer option was selected.  The native format that Weka commonly uses is the .ARFF format, fortunately for us, however, it also reads in .CSV files, which are easily created with a save option in excel.  The excel file we will first train is sim_training_set_perfect_sin.csv.  Once loaded, you will see all of the relevant variables in the Weka Explorer shell.  &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2T4bK9pJGI/AAAAAAAAACE/5dp_L5PUIYE/s1600-h/fig2_pt2back.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 299px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2T4bK9pJGI/AAAAAAAAACE/5dp_L5PUIYE/s400/fig2_pt2back.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432740196198720610" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. Loaded Excel csv training source file for Weka&lt;br /&gt;&lt;br /&gt;We notice some new variables have been introduced that were not in part 1.&lt;br /&gt;To understand why, let's show the CSV file that is used here.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2TneDPp8OI/AAAAAAAAABE/8YFztgYbSWk/s1600-h/fig2_training_sin-pt2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 283px; height: 400px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2TneDPp8OI/AAAAAAAAABE/8YFztgYbSWk/s400/fig2_training_sin-pt2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432721553968722146" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Training set variables.&lt;br /&gt;&lt;br /&gt;What we see is that the original perfect sine wave signal has been preserved in the column labeled signal.  The additional signals, s-1, s-2, s-3, s-4 are often called delayed or embedded (dimension) variables.  They are simply lagged values of the signal that are used to train the neural network.  There is no exact method to determine the number of lagged values, although a number of different methods exist. For now, we will simply accept that four delayed values of the signal are useful. The last column, called bias, is common to neural networks.  The bias node allows the neural network to shift the constant signal input to the network via training. For instance, imagine our signal had an average of 2.0 but we were learning it.  The neural network needs to have some input that will track that constant value or it will have large offset errors that will obstruct convergence.  The bias node accomplishes that operation. Those familiar with Engineering theory will recognize this node as a DC bias. &lt;br /&gt;&lt;br /&gt;Ok, so once other thing we notice in the GUI interface is the Class:signal(num) is selected on the bottom right.  This is because we are predicting a numerical class, rather than a nominal one (which is the typical default for classification schemes).&lt;br /&gt;&lt;br /&gt;Next, we select the classify tab to select our learning scheme, which in this case will be the MultilayerPerceptron.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2TpixmS9dI/AAAAAAAAABM/jvWIuEsD9Ew/s1600-h/fig4_mlpselect_pt2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2TpixmS9dI/AAAAAAAAABM/jvWIuEsD9Ew/s400/fig4_mlpselect_pt2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432723834154448338" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We then want to make sure certain options are selected.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2TqBIt34aI/AAAAAAAAABU/FrH6A6btiFA/s1600-h/fig5_MLP_options_pt2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width:400px; height: 364px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2TqBIt34aI/AAAAAAAAABU/FrH6A6btiFA/s400/fig5_MLP_options_pt2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432724355756319138" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We set nominalToBinaryFilter and normalize attributes as False, as we don't wish to modify the input data to be binary and are not using nominal attributes. However, we&lt;br /&gt;want the normalizeNumericClass set to True as mentioned earlier, it will force the normalization scheme to be set to Weka's internal limiting range, so we don't have to.  Also, we will train for 1000 epochs.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2TrGteZGLI/AAAAAAAAABc/9KRcrSmV7EU/s1600-h/fig6_pt2_MLP_options.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 364px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2TrGteZGLI/AAAAAAAAABc/9KRcrSmV7EU/s400/fig6_pt2_MLP_options.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432725551034472626" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 6. Preferences for MLP training model.&lt;br /&gt;&lt;br /&gt;We will build a model by training on 66% of the data.  We want to store and output the predictions so that we can visually see what they look like. Lastly, we will Preserve order for split as it allows us to display the predicted out of sample time series in the original order.  With all of these features set, we simply click OK and the start button and it will quickly build our first Neural Network model!&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2TsFKR83oI/AAAAAAAAABk/7zFB7ecWQ-Y/s1600-h/fig7_pt2_MLP_simple_sin.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 301px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2TsFKR83oI/AAAAAAAAABk/7zFB7ecWQ-Y/s400/fig7_pt2_MLP_simple_sin.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432726623918808706" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 7. Results with summary of statistics console.&lt;br /&gt;&lt;br /&gt;If we scroll up we can see the actual weights that the model converged upon for our Multilayer Perceptron that will be used to predict the out of sample data.&lt;br /&gt;We can see that there is a nice printout of the last 34% of results  (271 out of sample data points) along with the predicted value and error, as well as a useful summary of statistics in the bottom of the console.  We often use Root mean squared error as a performance metric for neural net regressions. In this case, the number .0005 is quite good.  But let's use a little trick to get a visual inspection of just how good.  We can actually grab the data from the console (by selecting it with the left mouse button and dragging), then copy this data back into excel.  As a result, we can then plot the actual versus predicted out of sample results inside of excel.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2Ttsvof8uI/AAAAAAAAABs/x-29TLbVPlc/s1600-h/fig8_pt2_MLP_excelimport.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 309px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2Ttsvof8uI/AAAAAAAAABs/x-29TLbVPlc/s400/fig8_pt2_MLP_excelimport.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432728403472020194" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 8. Importing prediction results back into Excel.&lt;br /&gt;&lt;br /&gt;Notice that we cut and paste the data from the Weka console back into Excel, but must select text to columns in order to separate the data back into columns.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2TuPFxQ_LI/AAAAAAAAAB0/dmulLKoU0B4/s1600-h/fig9_pt2_MLP_txttocolumns.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 305px;" src="http://3.bp.blogspot.com/_7YSZm5NIAmQ/S2TuPFxQ_LI/AAAAAAAAAB0/dmulLKoU0B4/s400/fig9_pt2_MLP_txttocolumns.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432728993529920690" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 9. Selecting the regions to separate as columns.&lt;br /&gt;&lt;br /&gt;And tada! We can now plot the predicted vs. actual values. And look how nicely they line up.  The errors are extremely small on the out of sample set, notice some are 0, others are .001, imperceptible to the eye, without zooming way in on that point.&lt;br /&gt;It actually found a perfect model for this time series (we will expand a bit later why), and the errors can be attributed to numerical precision.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2TvCbuJPYI/AAAAAAAAAB8/uUweYtZ-Nic/s1600-h/fig10_pt2_results.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 275px;" src="http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2TvCbuJPYI/AAAAAAAAAB8/uUweYtZ-Nic/s400/fig10_pt2_results.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5432729875595738498" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 10. Resulting plot of predicted vs. actual data.&lt;br /&gt;&lt;br /&gt;We have now just built a basic Neural Network with a simple sine wave time series using Weka and Excel.  The predicted out of sample results were extremely good.&lt;br /&gt;However, as we will see, the data signal we used, the simple sine wave is a very easy signal to learn as it is perfectly repetitive and stationary.  We will see that as the signal gets increasingly complex, the prediction results do not work as well.&lt;br /&gt;That's it for Part 2, comments are welcome.&lt;a href="http://old.nabble.com/WEKA-f435.html"&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-2855871561497713078?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/2855871561497713078/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/01/practical-implementation-of-neural.html#comment-form' title='20 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2855871561497713078'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/2855871561497713078'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/01/practical-implementation-of-neural.html' title='Practical Implementation of Neural Network based time series (stock) prediction - PART 2'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_7YSZm5NIAmQ/S2TlgGsWkPI/AAAAAAAAAA0/-U1DLOHeWuU/s72-c/weka_gui1.jpg' height='72' width='72'/><thr:total>20</thr:total></entry><entry><id>tag:blogger.com,1999:blog-107568321062020427.post-3482128625255666822</id><published>2010-01-29T21:59:00.001-08:00</published><updated>2010-02-16T17:11:46.301-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Practical Implementation of Neural Network based Stock Prediciton'/><title type='text'>Practical Implementation of Neural Network based time series (stock) prediction  - PART 1</title><content type='html'>The following introduction is to allow viewers to understand the basic concepts and practical implementation of neural nets towards a financial time series.  I will not go too deep into detail about the mathematics behind the neural net at the moment. My goal is to get you to understand practical details about how to actually implement a neural net using simple tools and models.  We will start with a simple model to understand a basic time series.  The time series waveform is a simple sine wave with the period set to 30 days.  It is implemented in excel as a source file to be processed in any Machine Learning capable software.  For this example I will be using a very good GUI Java based program called &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Weka&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2PN4NVWaRI/AAAAAAAAAAU/AWh6EOGEaw4/s1600-h/fig1_sin.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 366px; height: 242px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2PN4NVWaRI/AAAAAAAAAAU/AWh6EOGEaw4/s400/fig1_sin.jpg" alt="" id="BLOGGER_PHOTO_ID_5432411941074528530" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 1. Shows a simple sine wave set to a period (T) of 30 days.&lt;br /&gt;&lt;br /&gt;It is a very simple time series based upon the well known sine wave model.&lt;br /&gt;We can see that one complete cycle occurs over a period of 30 days. Each time step is set to 1 unit or day per step.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2PVEHRaUMI/AAAAAAAAAAk/-OgzugEXphg/s1600-h/fig2_complex_sin.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 227px;" src="http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2PVEHRaUMI/AAAAAAAAAAk/-OgzugEXphg/s400/fig2_complex_sin.jpg" alt="" id="BLOGGER_PHOTO_ID_5432419842187219138" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 2. A complex sinusoidal signal with f1 set to 1/T, where T=30 days.&lt;br /&gt;&lt;br /&gt;Anyone who has worked with financial time series knows that they can be far more complicated than simple sine based models, however, it is often better to learn from basic principles and move up in complexity in order to have a good grasp of what we are doing.  The second figure is a bit more complicated as it is the sum of three different sin based signals.  Each signal has a different Amplitude and Frequency associated with it. We could use Fourier Analysis to show the spectrum of the three different tones if we wished. However, for now we'll just accept that it is a complex signal.  Notice one property of this signal that is also a bit optimistic is that it is a stationary signal. Essentially a stationary signal has statistical properties that do not change over time. For example, if we were to sample the average from different slices, it would not change much. We also can visually see that the time series is mean reverting.  Financial time series differ in that they are not stationary, but are typically unit root and must often be transformed in order for the neural network to process them.  The purpose of the complex signal, however, is to show how we can move to an increasingly complex signal from a very simple model.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2PX_J50-SI/AAAAAAAAAAs/aFWulP179cc/s1600-h/fig3_normalized_complex.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 224px;" src="http://2.bp.blogspot.com/_7YSZm5NIAmQ/S2PX_J50-SI/AAAAAAAAAAs/aFWulP179cc/s400/fig3_normalized_complex.jpg" alt="" id="BLOGGER_PHOTO_ID_5432423055529146658" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Fig 3. Normalized Complex Signal&lt;br /&gt;&lt;br /&gt;The final step is to simply normalize the time series to be constrained between the vertical (what we call rails) range of minus 1 to plus 1.  A typical neural net is limited by an internal function, sometimes called a squashing function.  The function is a non-linear processing function that is often made up of a sigmoid or &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;tanh&lt;/span&gt; (hyperbolic tangent) function, which saturate at (0,1) and (-1,1), respectively.&lt;br /&gt;A simple transformation can be produced by &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;xnew&lt;/span&gt; =&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;xold&lt;/span&gt;*(&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;vmaxn&lt;/span&gt;-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;vminn&lt;/span&gt;)/(&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;vmaxo&lt;/span&gt;-&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;vmino&lt;/span&gt;).&lt;br /&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;Vmax&lt;/span&gt; and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;Vmin&lt;/span&gt; are the new and old maximum values of the time series. In this case we will use -.9 and +.9 as the limiting rails so as to avoid saturation effects.  Often software will do the normalizing for you. In the case of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;Weka&lt;/span&gt;, you can choose to have it do this operation for you, in which case no normalization is &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;neccessary&lt;/span&gt;. Although we should understand it for future reference.&lt;br /&gt;&lt;br /&gt;That's it for part I. Next we will investigate how to transport the data to Weka and have it build and predict the out of sample signal set!&lt;br /&gt;&lt;br /&gt;Please add any comments on where I can improve my tutorial as I am new to the blogger scene and appreciate any feedback.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/107568321062020427-3482128625255666822?l=intelligenttradingtech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intelligenttradingtech.blogspot.com/feeds/3482128625255666822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/01/systems.html#comment-form' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3482128625255666822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/107568321062020427/posts/default/3482128625255666822'/><link rel='alternate' type='text/html' href='http://intelligenttradingtech.blogspot.com/2010/01/systems.html' title='Practical Implementation of Neural Network based time series (stock) prediction  - PART 1'/><author><name>Intelligent Trading</name><uri>http://www.blogger.com/profile/17765336450326139518</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_7YSZm5NIAmQ/S2PN4NVWaRI/AAAAAAAAAAU/AWh6EOGEaw4/s72-c/fig1_sin.jpg' height='72' width='72'/><thr:total>15</thr:total></entry></feed>
