We wish to use a GA to maximize the function, f(X) = 6X1X2X3 / (X1 +2 X2 +2 X3)

ID: 3674003 • Letter: W

Question

We wish to use a GA to maximize the function, f(X) = 6X1X2X3 / (X1 +2 X2 +2 X3) , which is restricted by the relation that X1X2X3 = 16.

a. How would you encode this problem and why?

b. Show how you would handle the constraint using a penalty approach.

c. Using your encoding scheme, show a randomly generated population size of 6. Assume they each have the Fitness values shown.

Member-Fitness

1-963

2-1147

3-291

4-553

5-753

6 -1394

d. Given the random numbers as follows, r=(.1 .6 .7 .3 .2 .5 .8 .2 .4 .9 .1 .1 .4 .6 .8 .3 .6), use two body tournament selection to populate a mating pool. Explain how the random numbers are being used in your scheme. Illustrate the tournaments and the outcomes.

e. Illustrate crossover for your encoding and your mating pool using two appropriate members.

f. Show how mutation might occur with your encoding and a member of your population.

Explanation / Answer

Greedy Algorithms:

Huffman Coding Read: Section 16.3 in CLRS.

Huffman Codes: Huffman codes provide a method of encoding data efficiently. Normally when characters are coded using standard codes like ASCII, each character is represented by a fixed-length codeword of bits (e.g. 8 bits per character). Fixed-length codes are popular, because its is very easy to break a string up into its individual characters, and to access individual characters and substrings by direct indexing. However, fixed-length codes may not be the most efficient from the perspective of minimizing the total quantity of data. Consider the following example. Suppose that we want to encode strings over the (rather limited) 4-character alphabet C = {a, b, c, d}. We could use the following fixed-length code: Character a b c d Fixed-Length Codeword 00 01 10 11 A string such as “abacdaacac” would be encoded by replacing each of its characters by the corresponding binary codeword. abacdaacac 00 01 00 10 11 00 00 10 00 10 The final 20-character binary string would be “00010010110000100010”. Now, suppose that you knew the relative probabilities of characters in advance. (This might happen by analyzing many strings over a long period of time. In applications like data compression, where you want to encode one file, you can just scan the file and determine the exact frequencies of all the characters.) You can use this knowledge to encode strings differently. Frequently occurring characters are encoded using fewer bits and less frequent characters are encoded using more bits. For example, suppose that characters are expected to occur with the following probabilities. We could design a variable-length code which would do a better job. Character a b c d Probability 0.60 0.05 0.30 0.05 Variable-Length Codeword 0 110 10 111 Notice that there is no requirement that the alphabetical order of character correspond to any sort of ordering applied to the codewords. Now, the same string would be encoded as follows. a b a c d aa c a c 0 110 0 10 111 0 0 10 0 10 Lecture Notes 25 CMSC 451 Thus, the resulting 17-character string would be “01100101110010010”. Thus, we have achieved a savings of 3 characters, by using this alternative code. More generally, what would be the expected savings for a string of length n? For the 2-bit fixed-length code, the length of the encoded string is just 2n bits. For the variable-length code, the expected length of a single encoded character is equal to the sum of code lengths times the respective probabilities of their occurrences. The expected encoded string length is just n times the expected encoded character length. n(0.60 · 1+0.05 · 3+0.30 · 2+0.05 · 3) = n(0.60 + 0.15 + 0.60 + 0.15) = 1.5n. Thus, this would represent a 25% savings in expected encoding length. The question that we will consider today is how to form the best code, assuming that the probabilities of character occurrences are known. Prefix Codes: One issue that we didn’t consider in the example above is whether we will be able to decode the string, once encoded. In fact, this code was chosen quite carefully. Suppose that instead of coding the character ‘a’ as 0, we had encoded it as 1. Now, the encoded string “111” is ambiguous. It might be “d” and it might be “aaa”. How can we avoid this sort of ambiguity? You might suggest that we add separation markers between the encoded characters, but this will tend to lengthen the encoding, which is undesirable. Instead, we would like the code to have the property that it can be uniquely decoded. Note that in both the variable-length codes given in the example above no codeword is a prefix of another. This turns out to be the key property. Observe that if two codewords did share a common prefix, e.g. a 001 and b 00101, then when we see 00101 ... how do we know whether the first character of the encoded message is a or b. Conversely, if no codeword is a prefix of any other, then as soon as we see a codeword appearing as a prefix in the encoded text, then we know that we may decode this without fear of it matching some longer codeword. Thus we have the following definition. Prefix Code: An assignment of codewords to characters so that no codeword is a prefix of any other. Observe that any binary prefix coding can be described by a binary tree in which the codewords are the leaves of the tree, and where a left branch means “0” and a right branch means “1”. The code given earlier is shown in the following figure. The length of a codeword is just its depth in the tree. The code given earlier is a prefix code, and its corresponding tree is shown in the following figure. 110 111 10 0 0 0 1 1 0 1 b d c a Fig. 17: Prefix codes. Decoding a prefix code is simple. We just traverse the tree from root to leaf, letting the input character tell us which branch to take. On reaching a leaf, we output the corresponding character, and return to the root to continue the process. Expected encoding length: Once we know the probabilities of the various characters, we can determine the total length of the encoded text. Let p(x) denote the probability of seeing character x, and let dT (x) denote the length of the codeword (depth in the tree) relative to some prefix tree T. The expected number of bits needed to encode a text with n characters is given in the following formula: B(T) = n X xC p(x)dT (x). Lecture Notes 26 CMSC 451 This suggests the following problem: Optimal Code Generation: Given an alphabet C and the probabilities p(x) of occurrence for each character x C, compute a prefix code T that minimizes the expected length of the encoded bit-string, B(T). Note that the optimal code is not unique. For example, we could have complemented all of the bits in our earlier code without altering the expected encoded string length. There is a very simple algorithm for finding such a code. It was invented in the mid 1950’s by David Huffman, and is called a Huffman code.. By the way, this code is used by the Unix utility pack for file compression. (There are better compression methods however. For example, compress, gzip and many others are based on a more sophisticated method called the Lempel-Ziv coding.) Huffman’s Algorithm: Here is the intuition behind the algorithm. Recall that we are given the occurrence probabilities for the characters. We are going to build the tree up from the leaf level. We will take two characters x and y, and “merge” them into a single super-character called z, which then replaces x and y in the alphabet. The character z will have a probability equal to the sum of x and y’s probabilities. Then we continue recursively building the code on the new alphabet, which has one fewer character. When the process is completed, we know the code for z, say 010. Then, we append a 0 and 1 to this codeword, given 0100 for x and 0101 for y. Another way to think of this, is that we merge x and y as the left and right children of a root node called z. Then the subtree for z replaces x and y in the list of characters. We repeat this process until only one super-character remains. The resulting tree is the final prefix tree. Since x and y will appear at the bottom of the tree, it seem most logical to select the two characters with the smallest probabilities to perform the operation on. The result is Huffman’s algorithm. It is illustrated in the following figure. The pseudocode for Huffman’s algorithm is given below. Let C denote the set of characters. Each character x C is associated with an occurrence probability x.prob. Initially, the characters are all stored in a priority queue Q. Recall that this data structure can be built initially in O(n) time, and we can extract the element with the smallest key in O(log n) time and insert a new element in O(log n) time. The objects in Q are sorted by probability. Note that with each execution of the for-loop, the number of items in the queue decreases by one. So, after n 1 iterations, there is exactly one element left in the queue, and this is the root of the final prefix code tree. Correctness: The big question that remains is why is this algorithm correct? Recall that the cost of any encoding tree T is B(T) = P x p(x)dT (x). Our approach will be to show that any tree that differs from the one constructed by Huffman’s algorithm can be converted into one that is equal to Huffman’s tree without increasing its cost. First, observe that the Huffman tree is a full binary tree, meaning that every internal node has exactly two children. It would never pay to have an internal node with only one child (since such a node could be deleted), so we may limit consideration to full binary trees. Claim: Consider the two characters, x and y with the smallest probabilities. Then there is an optimal code tree in which these two characters are siblings at the maximum depth in the tree. Proof: Let T be any optimal prefix code tree, and let b and c be two siblings at the maximum depth of the tree. Assume without loss of generality that p(b) p(c) and p(x) p(y) (if this is not true, then rename these characters). Now, since x and y have the two smallest probabilities it follows that p(x) p(b) and p(y) p(c). (In both cases they may be equal.) Because b and c are at the deepest level of the tree we know that d(b) d(x) and d(c) d(y). (Again, they may be equal.) Thus, we have p(b) p(x) 0 and d(b) d(x) 0, and hence their product is nonnegative. Now switch the positions of x and b in the tree, resulting in a new tree T0 . This is illustrated in the following figure. Next let us see how the cost changes as we go from T to T0 . Almost all the nodes contribute the same to the expected cost. The only exception are nodes x and b. By subtracting the old contributions of these Lecture Notes 27 CMSC 451 30 b: 48 d: 17 f: 13 smallest smallest smallest smallest 22 12 a: 05 c: 07 e: 10 b: 48 d: 17 f: 13 30 smallest 52 b: 48 22 12 a: 05 0 b: 48 Final Tree 011 1 010 0 1 1 1 0 0 1 001 0001 1 0 0000 d: 17 f: 13 a: 05 c: 07 e: 10 e: 10 d: 17 f: 13 c: 07 e: 10 a: 05 c: 07 12 22 12 a: 05 c: 07 b: 48 d: 17 e: 10 f: 13 a: 05 b: 48 c: 07 d: 17 e: 10 f: 13 Fig. 18: Huffman’s Algorithm. Lecture Notes 28 CMSC 451 Huffman’s Algorithm Huffman(int n, character C[1..n]) { Q = C; // priority queue for i = 1 to n-1 { z = new internal tree node; z.left = x = Q.extractMin(); // extract smallest probabilities z.right = y = Q.extractMin(); z.prob = x.prob + y.prob; // z’s probability is their sum Q.insert(z); // insert z into queue } return the last element left in Q as the root; } T’’ (p(b)p(x))(d(b)d(x)) Cost change = < 0 Cost change = (p(c)p(y))(d(c)d(y)) < 0 T T’ x y c y x c b b c y b x Fig. 19: Correctness of Huffman’s Algorithm. nodes and adding in the new contributions we have B(T0 ) = B(T) p(x)d(x) + p(x)d(b) p(b)d(b) + p(b)d(x) = B(T) + p(x)(d(b) d(x)) p(b)(d(b) d(x)) = B(T) (p(b) p(x))(d(b) d(x)) B(T) because (p(b) p(x))(d(b) d(x)) 0.

Thus the cost does not increase, implying that T 0 is an optimal tree. By switching y with c we get a new tree T00, which by a similar argument is also optimal. The final tree T00 satisfies the statement of the claim. The above theorem asserts that the first step of Huffman’s algorithm is essentially the proper one to perform. The complete proof of correctness for Huffman’s algorithm follows by induction on n (since with each step, we eliminate exactly one character). Claim: Huffman’s algorithm produces the optimal prefix code tree. Proof: The proof is by induction on n, the number of characters. For the basis case, n = 1, the tree consists of a single leaf node, which is obviously optimal. Assume inductively that when strictly fewer than n characters, Huffman’s algorithm is guaranteed to produce the optimal tree. We want to show it is true with exactly n characters. Suppose we have exactly n characters. The previous claim states that we may assume that in the optimal tree, the two characters of lowest probability x and y will be siblings at the lowest level of the tree. Remove x and y, replacing them with a new character z whose probability is p(z) = p(x) + p(y). Thus n 1 characters remain. Consider any prefix code tree T made with this new set of n 1 characters.

We can convert it into a prefix code tree T0 for the original set of characters by undoing the previous operation and replacing z with x Lecture Notes 29 CMSC 451 and y (adding a “0” bit for x and a “1” bit for y). The cost of the new tree is B(T0 ) = B(T) p(z)d(z) + p(x)(d(z) + 1) + p(y)(d(z) + 1) = B(T) (p(x) + p(y))d(z)+(p(x) + p(y))(d(z) + 1) = B(T)+(p(x) + p(y))(d(z)+1 d(z)) = B(T) + p(x) + p(y). Since the change in cost depends in no way on the structure of the tree T, to minimize the cost of the final tree T 0 , we need to build the tree T on n 1 characters optimally. By induction, this exactly what Huffman’s algorithm does. Thus the final tree is optimal

Navigate

We wish to transmit an n-bit message to a recieving agent. The bits in the messa

We wish to use a hash Table, T, which stores 30 numbers that range in value from

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

We wish to use a GA to maximize the function, f(X) = 6X1X2X3 / (X1 +2 X2 +2 X3)

Question

Explanation / Answer

Related Questions

Navigate