Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Do not describe the randomized algorithms or provide conceptual background on ra

ID: 3872387 • Letter: D

Question

Do not describe the randomized algorithms or provide conceptual background on random algorithms. Simply provide a counter example to the proposed question.

Dr. Smart argues that because the randomized algorithms (such as Gibbs sampling algorithm) can identify the motifs consisting one k-mer in each input sequence, the consensus of these k-mers should be always the median string of the same set of input sequences. Is he correct? If yes, explain why. If not, can you give a counter example (i.e., where the consensus of the identified motifs is NOT the median string of the input sequences)?

Explanation / Answer

Randomized Algorithms

In this problem our main focus will be on this following things:

Randomized Algorithms:

This is the algorithms that make random decision rather than the deterministic decisions.

The main advantage is that no input can reliably produce worst-case results because the algorithm runs differently each time

This algorithm is used when we don’t know exact algorithm and any fast algorithm. Under this algorithm there are two types of algorithm and they are:

ii) Greedy Profile Motif Search :it is a probabilistic approach also known as Motif Finding Problem. In this type of approach a list of sequences “t” is given each having length n and used to find the find the “best” pattern of length l that appears in each of the t sequences.

Previously, we solved the Motif Finding Problem using a Branch and Bound or a Greedy technique and now, we randomly select possible locations and find a way to greedily change those locations until we have converged to the hidden motif.

Greedy PWM Motif Search:

GreedyPWMsearch(DNA, t, n, k )

Randomly select starting positions s=(s1,...,st) from DNA

bestScoreß0

while Score(s,DNA) > bestScore

   Form PWM P from s

bestScor ßScore(, DNA)

for i <-- 1 to t

       Find a P-most probable k-mer a from the i th sequence s

si <-- starting position of a

return bestScore.

Since we choose starting positions randomly, there is little chance that our guess will be close to an optimal motif, meaning it will take a very long time to find the optimal motif.

•It is unlikely that the random starting positions will lead us to the correct solution at all.

•In practice, this algorithm is run many times with the hope that random starting positions will be close to the optimum solution simply by chance

Gibbs Sampling

•GreedyProfileMotifSearch is probably not the best way to find motifs.

•However, we can improve the algorithm by introducing Gibbs Sampling, an iterative procedure that discards one k-mer after each iteration and replaces it with a new one.

•Gibbs Sampling proceeds more slowly and chooses new K-mers at random increasing the odds that it will converge to the correct solution

Gibbs Sampling Works

1) Randomly choose starting position s = (s1,...,st) and form the set of k-mers associated with these starting positions

2) Randomly choose one of the t sequences.

3) Create a profile P from the other t -1 sequences.

4) For each position in the removed sequence, calculate the probability that the k-mer starting at that position was generated by P.

5) Choose a new starting position for the removed sequence at random based on the probabilities calculated in step 4.

6) Repeat steps 2-5 until there is no improvement

for example:

Input:
3 5
GGCGTTCAGGCA
AAGAATCAGTCA
CAAGGAGTTCGC
CACGTCAATCAC
CAATAATATTCG

Output:
CAG
CAG
CAA
CAA
CAA

The greedy motif search algorithm iteratively finds k-mers in the 1st, string from Dna, 2nd string from Dna, 3rd string from Dna, etc. After finding i – 1 k-mers Motifs in the first i – 1 strings of Dna, this algorithm constructs Profile(Motifs) and selects the Profile-most probable k-mer from the i-th string based on this profile matix (ties are broken arbitrarily).
To form the initial motif matrix BestMotifs, the algorithm starts by selecting arbitrary k-mers in each string from Dna; the following pseudocode selects the first k-mer in each string.
Greedy MOTIF Search(Dna, k,t)
form a set of k-mers BestMotifs by selecting 1st k-mers in each string from Dna
for each k-mer Motif in the 1st string from Dna
Motif1 Motif
for i = 2 to t
form Profile from motifs Motif1, …, Motifi – 1
Motifi Profile-most probable k-mer in the i-th string in Dna
Motifs (Motif1, …, Motift)
if Score(Motifs) < Score(BestMotifs)
BestMotifs Motifs
output BestMotifs

MOTIF ENUMERATION is unfortunately rather slow for large values of k and d,
1. Implanted Motif: Find all (k, d)-motifs in a collection of strings.
Input: A collection of strings Dna, and integers k and d.
3 1
ATTTGGC
TGCCTTA
CGGTATC
GAAAATT

Output: All (k, d)-motifs in Dna. ATA ATT GTT TTT

Any (k, d)-motif must be at most d mismatches apart from some k-mer appearing in one of the strings of Dna, generate all such k-mers and then check which of them are (k, d)-motifs.

Median String: Find a median string.
Input: A collection of strings Dna and an integer k.
Output: A k-mer Pattern that minimizes d(Pattern, Dna) among all k-mers Pattern.

Given a k-mer Pattern and a set of strings

Dna = {Dna1, … , Dnat}, we define

td(Pattern,Dna)=d(Pattern,Dnai)

i=1

as the sum of distances between Pattern and all strings in Dna. For example, for the following strings Dna,
d(AAA, Dna) = 1 + 1 + 2 + 0 + 1 = 5
:ttaccttAAc 1
gAtAtctgtc 1
Acggcgttcg 2
ccctAAAgag 0
cgtcAgAggt 1

ind a k-mer Pattern that minimizes d(Pattern, Dna) over all k-mers Pattern, call such a Pattern a median string for Dna.

Gibbs Sampler is a more cautious iterative algorithm that discards a single k-mer from the current set of motifs at each iteration and replaces it with a new one (or keeps it). It thus moves with more caution in the space of all motifs, as illustrated below.
It starts with randomly chosen k-mers in each of t DNA sequences, but it makes a random rather than a deterministic choice at each iteration. It uses randomly selected k-mers (Motif1, …, Motift) to come up with another (hopefully higher scoring) set of k-mers.
It randomly selects an integer i between 1 and t and then randomly changes only one Motifi.
Gibbs Sampler(Dna, k, t, N)
randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna
BestMotifs Motifs
for i from 1 to N
i Random(t)
construct profile matrix Profile from all strings in Motifs except for Motifi
Motifi Profile-randomly generated k-mer in the i-th sequence
if Score(Motifs) < Score(BestMotifs)
BestMotifs Motifs
output BestMotifs

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote