1 day ago
Thursday, June 10, 2010
Quantitative Candlestick Pattern Recognition (HMM, Baum Welch, and all that)
Fig 1. Clustering based approach to candlestick Pattern Recognition.
I've been reading a book titled, 'the Quants,' that I'm sure will tantalize many traders with some of the ideas embedded within. Most notably (IMO), the notion that Renaissance's James Simons, hired a battery of cryptographers and speech recognition experts to decipher the code of the markets. Most notable of the hired experts was, leonard Baum, co-developer of the Baum-Welch algorithm; an algorithm used to model hidden markov models. Now while I don't plan to divulge full details of my own work in this area; I do want to give a brief example of some methods to possibly apply with respect to these ideas.
Now most practitioners of classical TA have built up an enormous amount of literature around candlesticks, the japanese symbols used to denote a symbolic formation around open, high, low, and close daily data. The problem as I see it, is that most of the literature that is available only deals with qualitative recognition of patterns, rather than quantitative.
We might want to utilize a more quantitative approach to analyzing the information, as it holds much more potential than single closing price data (i.e. the information in each candle contains four dimensions of information). The question is, how do we do this in a quantitative manner?
One method to recognizing patterns that is well known is the supervised method. In supervised learning, we feed the learner a correct list of responses to learn and configure itself from; over many iterations, it comes up with the most optimal configuration to minimize errors between the data it learns to identify, and the data we feed as examples. For instance, we might look at a set of hand-written characters and build a black box to recognize each letter by training via a neural net, support vector machine, or other supervised learning device. However, you probably don't want to spend hours classifying the types of candlesticks by name. Those familiar with candlesticks, might recognize numerous different symbols; shooting star, hammer, doji, etc... Each connotating a unique symbolic harbinger of the future to come. From a quantitative perspective, we might be more interested in understanding the bayesian perspective; I.e. P(upday|hammer)=P(upday,hammer)/P(hammer), for instance.
But how could we learn the corpus of symbols, without the tedious method of identifying each individual symbol by hand? This is a problem that may better be approached by unsupervised learning. In unsupervised learning, we don't need to train a learner; it finds relationships by itself. Typically, the relationships are established as a function of distance between exemplars. Please see the data mining text (Witten/Frank) in my recommended list in order to examine the concepts in more detail. In this case I am going to train using a very common unsupervised learner called k-means clustering.
Fig 2. Graph of arbitrary window of QQQQ data
Notice the common time ordered candlestick form of plotting is displayed in Fig 2. Now using k-means clustering, with a goal of identifying 6 clusters, I tried to automatically learn 6 unique candlestick forms based on H,L,Cl data relative to Open in this example. The idea being that similar candlestick archetypes will tend to cluster by distance.
Fig 3. Candlestick symbols sorted by 6 Clusters
Notice in Figure 3, that we can clearly see the k-means clustering approach automatically recognized large red bodies, green bodies, and even more interestingly, there are a preponderance of hammers that were automatically recognized in cluster number 5.
So given that we have identified a corpus of 6 symbols in our language, of what use might this be? Well, we can take and run a cross tabulation of our symbol states using a program like R.
Fig 4. Cross Tabulation of Clustered States.
One of the things that strikes me right away is that there are an overwhelming number of pairs with state 1s following state 5; Notice the 57% frequency trounces all other dependent states. Now what is interesting about this? Remember we established that state 5 corresponds to a hammer candlestick? Well, common intuition (at least from my years of reading) expects that a hammer is a turning point that is followed by an up move. Yet, in our table we see it is overwhelmingly followed by state 1, which if you look back, at the sorted by cluster diagram, is a very big red down candlestick. This is completely opposite to what our common body of knowledge and intuition tells us.
In case it might seem unbelievable to fathom, we can resort the data again, this time in original time order, but with clusters identified.
Fig 5. Visual Inspection of hammer (state5) likely followed by down candle (state 1)
We can go back and resort the data and identify the states via the resorted cluster ID staircase level(5), or use labels to more simply identify the case of hammer(5) and its following symbol. Notice that contrary to common knowledge, our automatic recognition process and tabulated probability matrix, found good corroboration with our visual inspection. In the simple window sample (resized to improve visibility), 4 of the 5 instances of the hammer (state 5) were followed by a big red down candle (state 1). Now one other comment to make is that in case the state 5 is not followed by state 1 (say, we bet on expecting a down move), it has a 14.3% chance of landing in state 6 on the next move, which brings our likelihood of a decent sized down move to 71.4% overall.
We can take these simple quantitative ideas and extend them to MCMC dynamic models, Baum Welch and Viterbi algorithms, and all that sophisticated stuff. Perhaps one day even mimicking the mighty Renaissance itself? I don't know, but any edge we can add to our arsenal will surely help.
Take some time to read the Quants, if you want a great laymen's view of many related quant approaches.
There may be some bugs in this post, as google just seemed to update their editing platform, and I'm trying to iron out some of the kinks.
Subscribe to:
Post Comments (Atom)
Thank you for this article. I found this very interesting. I've been working with a related approach using classification algorithms based on the three parts of the candlestick (upper /lower shadow and body). Your text triggered some ideas...
ReplyDeleteYour blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging.. how to read candlestick charts
Deletevery nice! care to post the code for this?
ReplyDeleteThanks for posting this very interesting article!
ReplyDeleteI'd be interested in how exactly you came up with the k-means clustering. You mention Google - are you using the prediction API? Which part of your analysis was done in R?
Would you be willing to share your code with us, please? It would be very helpful for those still learning!
Thanks very much!
Dirk
Thanks for the replies,
ReplyDeleteI don't have all the code available to post as some was custom. However, I believe most, if not all, can be replicated in R.
The k-means clustering is built into R (kmeans).
The probability matrix (prop.table) is also an R feature.
Accessing data, pre-processing, and graphing can be accomplished using R (Quantmod package).
IT
Apologize if this is repost. I've been having buggy problems with blogger the last few days.
How would you go about choosing the parameters for the k-means algorithm?
ReplyDeleteHi Craig,
ReplyDelete1) I chose the candlestick parameters partially by experimentation and experience. I'm assuming you're referring to the H/O, Cl/O, L/O parameters. In a more practical application I don't actually limit it to that set. However, it makes it much easier to visualize and grasp in only 3 dimensions.
2) The k-means algorithm itself has only 1 major parameter (other than say the distance metric, which is typically euclidian) which is the number of clusters to identify. In this case I used 6; however, again a lot of that is part art, as I'm not certain there is an optimal value to be found in this application. One way to think about it is to imagine how many unique symbols exist in candlestick parlance (far more than six).
Cheers,
IT
Extremely cool. Thank you for sharing your thought process.
ReplyDeleteGreat article. Definitively gave me something to think about.
ReplyDeleteI do have a question/comment. Recently I've being using chaos theory/dynamical systems methods for forecasting financial data. One of the crucial steps in my algo is the selection of data used for geting the actual forecast. K-means clustering like something I could work with. But, I can copy what you did, as I am working in dimensions that cannot be visualized (at least 10). This prevents me from making a visual inspection if the clustering even makes sense (you did it in 3 dimension). So, my question is/are: How much cluster is enough? Are there any statistics I could use as a benchmark in determining how much clusters should I use?
Looking forward hearing from you.
Anton
Hi Anton,
ReplyDeleteThanks for the kind comments. I've done some work with chaos, and although there's a lot to discuss, in most cases, I've found that stock market time series are not all that chaotic in the deterministic sense. If you are familiar with phase space, you can easily see order in 2 dimensional embedded attractors for say the simple logistic equation (or think of the Lorenz attractor). It is much more difficult to see such simple order in financial series, however. In fact, I've been thinking about doing a blog on this very concept soon.
Regarding dimensions; as I mentioned earlier to Craig, it's part art, part science. I might expand on this at some point, but as you pointed out, the 3 dimensions make it very intuitive to visualize how it is capturing features that relate to common visual perspectives of candlesticks. However, even using 10 dimensions to cluster, you can still look at how it is placing the candlestick information in a lower (3d) space; because you can just look at what features it is identifying by sorting by cluster, and observing candlesticks in normal 2D space, as I've done in the example. One other aspect I haven't covered is that the cluster centroids will change depending on the sample space and size, so you have to be careful about how to approach in-sample/out of sample data.
Also, regarding higher dimensions (like 10) you also have to be aware of the curse of dimensionality, as the distance of data points tend to spread far out and cluster near edges of the hypercube; it's far less intuitive, so you might focus on different sample sets and what measure you are trying to capture to determine if it's capturing relevant information (although you can't see it intuitively in higher space, you can measure the information metrics in lower space).
IT
Dear IT
ReplyDeleteFirst, I would like to apologize for all the typos in my last post, it was really late when I was writting it. In fact, I'm stuned that you were able to make sense some of things I wrote.
Concerning chaos theory in forecasting, my opinion is that these methods (embedding the data etc.) are nothing more than a sophisticated pseudo -pattern recognition- metods. Example, suppose you have an up move by a 2%, then a down move by a 1.5% (embedding dimension is 2), you find simillar up-down in the past, project this past points into "the future", mesh the projected points and hope you can come up with something useful.
I actually observed (visually) the embedded data and couldn't find much order, so I was thinking cluster analysis could be useful here, since it would allow me to separate the data according to some rule.
Just a short comment, I have found articles in which the authors claim that there is an evidence of low-dimensional chaos in financial data, usually indices (think 2 to 5 dimension), while some suggest embedding dimension up to a 100.
I don't think visual inspection is going to work in my case, since my algos are completely "automated", I just pop in the data and let them work.
I did find something useful, its called "silhouette plot". It gives to each point (in each cluster) a number between -1 and 1. 1 indicates that the point is distant from neighbouring clusters, while -1 indicates that the point is probably in the wrong cluster. Taking the mean could use a benchmark how well did the clustering work.
Cheers
Anton
Anton,
ReplyDeleteNo problem on typos; I do the same myself all the time. I think one of the things that is useful about clustering is that it allows us to re-generate the features of a given set of data into something that might prove more useful than conventional data representation. For instance, in chaos, using lags and embedding with percent changes is the common method that most think about, and often yield nothing beyond a cloud of random data. However, by transforming the data in creative methods, we might better be able to observe some type of existing order.
Ultimately, I've always said that formulating the problem is the most challenging part of math applied to trading; much more than understanding the actual math or machine learning.
IT
Hi IT,
ReplyDeleteI have been experimenting with k means and am finding that the transition matrix is not stable from run to run, i.e. that the probabilities are changing. There seems to be convergence when I have a low number of days to classify, but as the length of the series increases is see this convergence disappear.
After searching if have found numerous references to the convergence to the cluster means, but not so much on the uniqueness (I do have a paper to read the might be helpful).
So my question is whether you know if there is a unique set of clusters for a data set?
Just for reference I am using the kmeans function in Matlab.
Adam
Unless I am mistaken, k-means is non-deterministic. It starts with random centroids and, depending on the data, will often result in different outcomes when run multiple times.
DeleteHi William,
DeleteThat is absolutely correct. What is important is to make sure the validation set uses the same set of constraints as the test set. At the end of the day, unless you have very separated regions of information (in which case even random initializations should converge to equal means), you're simply using the algorithm to reduce the Classification regions to a much smaller set.
If you truly wanted a deterministic cluster identifier, given such a randomly spaced field of data, you could simply force regions of interest by preselect boundaries. For instance, given a two dimensional grid, you can slice the grid up into n equally spaced partitions and use those partitions as your reference coordinates. This guarantees reference Coordinates will match in both sets.
Hope that makes sense.
IT
Adam,
ReplyDeleteThat's been my experience as well. I alluded to this a bit on my chaos article. Have a look at the final excerpt at the end, regarding stability and pockets of order coalescing and dispersing.
I'll say one final comment, in that you might want to experiment with segments of data using k-means to identify clusters in one segment, and maybe a supervised learner in the next (kNN for example). Then you can look into various cross validation types of experiments (walk forward is preferable IMO). Hope that gives you some ideas.
IT
Thanks for getting back to me.
ReplyDeleteThat does give me some ideas.
I have come to the same conclusions as you did in your chaos post regarding order appearing appearing and disappearing. It is good to see others finding this and that I am not missing something.
Anyway, keep up the good work on the blog. I will read all of it eventually.
Adam
Hi IT,
ReplyDeleteNow that I think of it I may not have completely worded my question correctly.
What I meant to say was that if you run the kmeans clustering on a data set, then run it again on the same data set should you necessarily get the same probabilities in the transition matrix? I understand that the rows/columns will interchange as you are not forcing one candle stick pattern to be a certain cluster number.
Thanks
Adam
Adam,
ReplyDeleteEven if you run it on the same data set over and over, there is no guarantee that it will generate the global best solution or even the same centroids. This is similar to training a neural network, in that the initialization of any learner typically starts with random values; since it will not likely evaluate every possible arrangement and each initialization is random, you can get different results per run.
In the case of k-means, each time it starts out with a random location for each cluster centroid. There are several approaches that have been developed to deal with this. In 2 dimensions, you can visualize the way that it it divided candlestick clusters and see which solution makes intuitive sense to you.
Other issues to consider are that outliers may even skew the entire cluster! So, often k-medoids or similar other learners are preferred.
Going back to my earlier comment, even if you do not have a guaranteed optimal cluster arrangement, as long as you are using that as a reference, then using a supervised learner out of sample will be guaranteed to use the particular clusters you found (and accepted) as a fixed reference for which you can compare transition matrices for the features you identified in sample. Hope that makes sense.
IT
Hi IT,
ReplyDeleteThanks for your reply.
That is pretty much what I figured, I just wanted to make sure that I was not doing something stupid.
Adam
Dear IT
ReplyDeleteJust wanted to let you know I did my "chaos analysis" using K-means clustering and unfortunately improvement was slim to none. I'm becoming increasingly sceptical that there will be an easy way to forecast financial data using chaos theory.
Cheers
Anton
I have to say that I got inspired by your approach, since I heard about kmeans at university. So, I spent a couple of days doing my analysis but I got a bit stuck how to use it further.
ReplyDeleteI applied it on hourly forex data EURUSD, as an example and tried out with a feature space of different dimensions, mainly concerning the 3 last candles.
I then did some statistical analysis on the returns (mostly 1-5 bars later) conditioned on the the cluster classification. Unfortunately, it got a bit frustrating, since I haven't found statistical evidence over a different set of samples, so I couldn't reject the null that the returns are not different.
So I would appreciate if you could elaborate a bit more in detail, how you can further process your data, once it is divided in the different clusters.
I just can't believe that there is no return distribution difference depending on the pattern. ...
Many thanks again for the post
Cheers
Nic
nicolas,
ReplyDeleteThanks for your reply, I've been a bit busy lately, so apologies for the delay.
I tried to give some details (such as transition matrices). Once you have assigned clusters, the cluster states become the HMM states. With regards to evaluating from a statistical point of view, it's a bit of an art to get some OOS behavior to reject the null of chance based returns.
Hopefully, you'll get some inspiration on the NLP thread.
Cheers,
IT
So how would you define a distance metric between the two candles? Thanks.
ReplyDeleteIn this case, each candle attribute is expressed as a coordinate defined by relative distances to open (i.e. H to O, L to O, Cl to O). Once the three dimensional coordinate is defined, the cluster algorithm in R will choose the default distance measurement (I'd have to check, but I think it's typically euclidean) to cluster by distance between similar candlesticks.
ReplyDeleteThank you for the commend on distance.
ReplyDeleteIT
ReplyDeleteI wanted to leave a quick note to thank you setting up your blog. I had accidentally bumped into your blog and am very excited to read your posts. For a guy who was lost in TA, you have given new ideas to pursue. And for that I thank you for sharing your ideas.
Regards
marketdust
Thank you for the comments marketdust.
ReplyDeleteBrilliant, I was playing with minitab myself.. trying to figure out classifications. Glad I bumped into your post. Some really nice ideas !
ReplyDeleteMay I know how many days are used for the clustering.
ReplyDeleteHi Nammik,
ReplyDeleteIt's been a while since I published this study. I would guess it is about 10 years on 1 instrument.
Best,
IT
Introduction to Japanese candlestick chart is incomplete without talk about of dissimilar terminologies implicated. For more
ReplyDeleteSee here.
Very interesting blog, actually I have been working with related approach using candlesticks chart based on three parts of candlestick. I am looking for more beneficial strategies. I would like you to suggest me with more great tips.
ReplyDelete