ScientistsperformingDNAsequencealignmentoftendesireamoreparametrized way of alig
ID: 3817478 • Letter: S
Question
ScientistsperformingDNAsequencealignmentoftendesireamoreparametrized way of aligning two strings than simply computing a longest common subse- quence between these two strings. One way to achieve such flexibility is to use the Smith-Waterman algorithm, which is generalization of the dynamic pro- gramming algorithm for computing a longest common subsequence so that it can handle weighted scoring functions. In particular, for two strings, X and Y , sup- pose we are given functions, defined on characters, a, from X, and b from Y , as follows: M (a, b) = the positive benefit for a match, if a = b M (a, b) = the negative cost for a mismatch, if a = b I(a) = the negative cost for inserting a at a position in X D(b) = the negative cost of deleting b from some position in Y . Given a string, X, of length n and a string, Y , of length m, describe an algorithm running in O(nm) time for finding the maximum-weight way of transforming X into Y , according to the above weight functions, using the operations of match- ing, substitution, insertion in X, and deletion from Y .
Explanation / Answer
We want to convert the string X into string Y. Let a is in X and b is in Y.
M (a, b) = the positive benefit for a match, if a==b
M(a, b) = the negative weight for a mismatch, i a!=b
I(a)=the negative weight for inserting a at a position in X
D(b) = the negative weight of deleting b from some position in Y
Suppose we need to convert a string X of size m to a string Y of size n with maximum weight.
First we will consider a smaller subproblem and define an optimal structure of the solution.
Let weight ( i , j ) denotes the maximum weight to convert first ' i ' characters of X to first ' j ' characters of Y.
There are 2 possibilities :
1 ) if X ( i ) == Y ( j ) , problem is reduced to finding weight ( i - 1 , j - 1 ) ( no edit required )
2 ) if X ( i ) != Y ( j ) , then we can choose any of the following edit operations :
Insert : problem reduces to finding weight ( i , j - 1 ) + INSERT_weight
i.e we insert the mismatched character and add the weight of insertion.
Delete : problem reduces to finding weight ( i - 1 , j ) + DELETE_weight
Replace : problem reduces to finding weight ( i - 1 , j - 1 ) + REPLACE_weight
Now, our objective is to find maximum edit weight i.e to maximize weight ( i , j ) . So, we will consider the maximum of the values of above subproblems or possibilities and the optimal structure of the solution is defined as :
weight ( i , j ) = max [ weight ( i , j - 1 ) + I , weight ( i - 1 , j ) + D , weight ( i - 1 , j - 1 ) + R ] if X ( i ) != Y ( j ) and
weight ( i , j ) = weight ( i - 1 , j - 1 ) if X ( i ) == Y ( j )
Now, we can find weight ( i , j ) for any values of i and j.
Finally, weight ( m , n ) gives the required maximum edit weight to convert X to Y.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.