We started by discussing the goal of a genetic algorithm, which is to optimally find the candidate pool of rules that are superior to other potential rules. In our example of moving averages, we are seeking the values of parameters of the rule :
if ma(m) binop ma(n) then action.
*Note: binop is short for binary operator; in this case the binary operator is > or <.
The GA (genetic algorithm) works by encoding the rule set into a string of binary valued variables. For instance if we wanted to encode the moving average parameter
to 4 real decimal values, we could simply use a 2 bit string, where 00 = 0 decimal, 01= 1 decimal, 10 = 2 decimal and 11 = 3 decimal. We can encode up to 2^n values per bits contained in each string. If we wanted to encode 512 values, we would need a 9 bit string to encode this value (2^9=512).
Also, we can encode values other than decimal values as binary bits, for instance,
action = buy or sell, can be represented by 0 or 1. Greater or Less than (> or <) can be represented by 1 or 0, as well. In the end we will have a chromosome or total string that represents the rule we are trying to optimize. So the rule: if {ma(m) binop ma(n) then action} could be encoded by binary values, as each chromosome is represented by 4bits- 2bits- 4- bits- 2 bits, where each element of the rule string corresponds to the encoded values discussed above. Note that the encoded blocks would be comparable to genes.
Fig 1. Examples of Boolean Encoding
Once we encode our rule set into a boolen representation or string, we then want to generate a population of strings to select from. Typically, we start out by assigning random values to the parameters. For instance, we may start a population of 100 strings; each string representing a set of rules with different parameters.
One string could be if ma(10)<ma(50) then buy, another might be if ma(20)>ma(200) then sell. Once a population has initially been created, we need to create diversity and additionally successfully improve fitness in the offspring over successive generations.
The concept of fitness is perhaps one of the most elegant and flexible options that makes the GA such a powerful optimizer. In the decision tree learners and Neural Network learners we discussed, there are only one or two simple goals to train on (decision tree for instance trains towards goal of reducing information entropy, neural net trains on reducing fitted variance errors). The GA can use any goal you can imagine, which gives it unlimited flexibility compared to others. You could use
total gain as a goal, or sharpe ratio, or profit factor as goals. You could even combine goals. The fitness or goal is what you are trying to optimize. Keep in mind that a genetic algorithm, like any other learner does not guarantee you will find the absolute best, it may get caught in local maxima of the fitness landscape.
However, you can get more sophisticated and add other sub methods to try to avoid this.
Fig 2. Example of population of rules to be processed.
Once our initial population of parameter based rules has been created (randomly), we then want to think about how we achieve the goal of optimizing or finding the set of rule parameters that best optimizes our fitness. Note that each time we create a new population of offspring, we call this a new generation run or epoch. The first set of offspring or parents, commonly attempts to select some sample of the fittest members to be passed along to the next population. We could use a greedy method that just sorts or ranks the members and selects only from the best 50% to be passed to the next generation (known as truncation selection) or an alternate method is to use something called roulette selection. In roulette selection, we sample members of the original population based upon their normalized fitness. So if the best fitness was 20% of the value of the sum of all the finesses, we would copy over that string or rule with a 20% probability into the next generation. The same would be applied to the other fitness/string combinations. Ultimately, we end up with more of the offspring selected from the most fit candidates in the prior generation. Next, we want to assure some diversity in the offspring. Crossover operation achieves this by crossing over or swapping genes from one candidate and another. This is performed over the entire population to ensure diversity. Lastly, we use mutation to randomly select some number of string elements and flip them. It adds a bit more random diversity to the offspring, so that possibly some candidate may show up unexpectedly that has great performance (think unusual height in basketball for instance).
More bells and whistles can be added to improve performance. Tournament selection is another method that improves offspring selection by running a tournament between string candidates. The winning candidate gets passed along to the next generation.
Fig 3. Selection process.
We essentially run the optimization and diversity routines through each generation, and the best candidates get passed down through to the next generation until our number of generations has run out, or we specify an early stopping criterion.
In the case of our rule set, we expect it to converge to the best set of parameters
(moving average arguments, and binary greater or equal than operator), based upon the fitness goal we assign to it.
Next. Genetic Algorithm Systematic Trading Development-- Part 3
12 hours ago
Hey! Thanks for all the great posts. In Fig. 1, I think you meant the decimal 5 be represented by the binary string 101 rather than represented by the binary string 100.
ReplyDeleteYou are absolutely correct. I learned how to count a long time ago in binary, so sometimes I wrote out the truth tables quickly without verifying. It was a slip and has been corrected.
ReplyDeleteThanks for your keen observation as that was a very important point!
hi,
ReplyDeleteI do not see the point of using GA to find the best rule set. Would not it be easier to just simply calculate e.g. Sharpe ratios of different MA combinations in the in-sample period and then simply trade the one with the highest Sharpe in the out-of-sample? I want to say, that GA is just an optimization technique, such as pattern search,etc. Maybe you are just trying to show how one would go about coding the system using GA and you are showing just very simple example. However, I cannot imagine in what case such a procedure would be justified. I think however that I probably have tonnes to learn from you (as you are 10 years in financial markets and living from trading systems). Here is my blog: jrudy.wordpress.com
Intelligent Trading said...
ReplyDeleteHi Jozef,
GA is one optimization procedure that is useful in evaluating large search spaces. If the permutation set of your input rule space is small, then yes, it might be simple enough to just run the sharpe ratio or suitable metric of some trading rules and use that as an optimization criteria using simple sorting. However, as the permutations of inputs and rules become large, some type of optimization method like GA becomes more necessary.
Imagine you have a rule; if ma(m)>ma(n) then buy, and suppose m and n are swept over 30 variables each, that's 30^2 combinations and about ~1000 variable sets to sweep over. Then you have the < case to test, and buy and sell rule evaluations, so multiply by about 4. You are now sweeping over a set of about 4000 variables. But, what happens when you start to add additional boolean rules.
For example, if ma(m)[><]ma(n) [AND,OR] rsi(y) [><]rsi[x] then [buy,sell] elseif [var[t] [><]var [e]] then [buy,sell]... Do you see how the set of rules to evaluate is becoming impossibly large?
Yes, the example was meant to be very simple for intuition, as such a simple set of parameters could easily be evaluated via a sweep of variables over some range.
I like your blog at first peek, and will look a bit more.
Thanks for stopping in.
IT