Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

2. A DNA sequence can be represented by a sequence of letters, where the \"alpha

ID: 3072802 • Letter: 2

Question

2. A DNA sequence can be represented by a sequence of letters, where the "alphabet" has four letters: A (adenine), C (cytosine), T (thymine), and G (guanine). Suppose that such a sequence is generated randomly, where the letters are independent and the probabilities of A, C, T, and G are Pa, pc, , and pG, where these sum to 1. (a) Determine the probability that a DNA sequence of length 4 takes on the value ACTG. Also determine the probability that a DNA sequence of length 4 takes on the value GTAC. (b) Determine the probability that a DNA sequence of length 4 consists of all A's, C's, T's, or G's. (c) Determine the probability that a DNA sequence of length 4 consists of four different letters, e.g., ACTG or GTAC, with no repeated letters among the four letters. (d) How many different sequences of length 4 consist of exactly two different letters?

Explanation / Answer

As provided in the question, we have the following data:

P(Adenine) = pa

P(Cytosine) = pc

P(Thymine) = pt

P(Guanine) = pg

Now, we would solve each part one by one:

a) The aim is to prepare a DNA Sequence ACTG, the probability of this will be given by the products of P(Adenine), P(Cytosine), P(Thymine), and P(Guanine) since they are independent events.

Note, the probability of any combination of A, C, T, and G will be the same irrespective of the order.

Therefore, P(ACTG) = P(GTAC) = pa * pc * pt * pg

b) In this part, the aim is to prepare a DNA sequence of length 4 but with all the 4 letters being the same. Here, P(XXXX) = pX * pX * pX * pX = (pX)4

where X is either A or T or C or G.

For example: P(TTTT) = pT * pT * pT * pT = (pT)4

c) It is same as the part a), there we have already mentioned that the order doesn't matter so the probability of all being different letters will be:

P(ACTG) = P("Any order of A, C, T & G") = pa * pc * pt * pg

d) It includes some concepts of Permutation and Combination. The question asks us how many different sequences can be made using exactly two different letters.

We have 4 letters to choose from and we have to choose two letters. Let's say we choose A and T. Now, let us count the number of possible sequences by using letters A & T only. The possible sequences are:

Using 'A' thrice and 'T' once: AAAT; AATA; ATAA; TAAA

Using 'T' thrice and 'A' once: TTTA; TTAT; TATT; ATTT

Using both 'A' and 'T' twice: AATT; ATAT; TAAT; TATA; TTAA; ATTA

Total number of sequences= 4 + 4 + 6 = 14

Since we have 6 (= 4C2 ) different combinations of two different letters, we will have a total of 6*14= 84 sequences. This is according to the law of multiplicity.