previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Variants of the dynamic programming algorithm


Local Alignments

In many cases, however, sequences share only a limited region of similarity. This may be a common domain or simply a short region of recognizable similarity. This case is dealt with by so-called local alignment in an algorithm due to Smith and Waterman. Local alignment aims at identifying the best pair of regions, one from each sequence, such that the optimal (global) alignment of these two regions is the best possible. This relies on a scoring scheme that maximizes a similarity score because otherwise an empty alignment would always yield the smallest distance. Naively, the algorithm to compute a local alignment would need to inspect every pair of regions and apply a global alignment algorithm to it. The decisive idea of Smith and Waterman was to offer the maximization in each cell of the matrix a fourth alternative: a zero to signify the beginning of a new alignment. After filling the dynamic programming matrix according to this scheme, backtracking starts from the cell in the matrix that contains the largest value.

Given two strings s and t. The question is: Which one is the highest scoring alignment between any substring of s and any substring of t? The problem may be solved by applying the dynamic programming algorithm to each pair consisting of one substrings of s and one substring of t, what takes time in O(n6).

The Smith-Waterman Algorithm

The Smith-Waterman Algorithm takes time in O(n2). From the point of view of a longest path problem, the algorithm introduces additional edges with weight zero in the edit graph (creating a 'sunken town' showing only the church towers')

The minimum score of the best alignment is zero, as this is the score for the alignment between the empty substring of s and the empty substring of t. The array L(i,j) is initialized as above:


The other entries of the array are calculated analogously as in the algorithm for a global alignment, but instead of three, there are four possibilities now:

For the affine linear gap penalty function g(k)=a+b(k-1), whereat $ 0 \leq b \leq a $, there is


\begin{displaymath}L(i,j) = max \cases {
0\cr
E(i,j)\cr
L(i-1,j-1)+w(s_{i},t_{j})\cr
F(i,j)}
\end{displaymath}

whereat

A zero in the array denotes the beginning of a local alignment. The end of the best local alignment is given with the maximum in the array, where backtracking is started (thinking at the 'sunken-town'-picture, one could say that backtracking is done from the peak of the highest church-tower down to the water-surface).


Comments are very welcome.
luz@molgen.mpg.de