Online Lectures on Bioinformatics

Algorithms for the comparison of two sequencesGeneral gap functions for type II alignmentsBiologically it is important to penalize a long gap not as the sum of single gaps because it is likely to be due to one evolutionary event. The above algorithm for type II alignments falls short of doing so. In fact the paradigm of the type II graph breaks down when gap functions are introduced. First the set of edges would have to be extended to include long horizontal or vertical edges (from (i,j) to (i,j+k) and to (i+k,j)). The attribution of weights to these edges, however, is ambiguous as it depends on the preceding edge in a path. If, e.g., a vertical edge was preceded by another vertical edge, this corresponds to a longer gap, the weight of which is not necessarily the sum of the two edges.
In [WSB76] it is shown that in spite of this complication
an algorithm similar to the one for single indels exists
to calculate an optimal type II alignment with more general gap functions.
They proved that the following algorithm using a gap function g with
produces the optimal alignment ([WSB76], [Wat84b]):
Without having to give the exact (rather lengthy) proof for this algorithm
here,
there still is a way to motivate the above formula for a more restricted class
of biologically reasonable gap functions. When assessing the score of an
alignment a gap will always be seen as one entity of maximal size and not
as two (or more) adjacent gaps of smaller size. The penalty for the gap
should therefore never exceed the sum of the penalties given to its parts.
Mathematically this leads to the requirement of subadditivity:
This condition implies that the maximal vertical or horizontal edge out of those that make up a certain gap is the cheapest one and therefore will be chosen by the optimization procedure. The problem of ambiguities in the assignment of weights to the edges in the graph is hereby circumvented. For a gap function of the form subadditivity is certainly fulfilled as Such a gap function will be called linear. Waterman [Wat84a] introduces also subadditive gap functions.
The explicit algorithm for linear gap functions can now be written
in the following form ([Got82], [Wat84a]):
with This algorithm then involves inspection of a row and column up to the current point for every matrix entry. Gotoh [Got82] showed that this step can be abbreviated to comparison of one new and one old value per row/column. This is due to the fact that the penalties for the gaps of various length all change by the same additive constant when the algorithm proceeds from cell (i,j) to (i+1,j) or (i,j+1). We can therefore formulate the algorithm to calculate a type II alignment under linear gap penalties as follows. Algorithm AII1: where Checking only 5 values at every matrix entry this algorithm has a time complexity of O(n^{2}) just like algorithm AIIs. However, it has the great advantage of allowing for the more useful linear gap penalty functions. This has made it the most commonly used form of sequence alignment algorithm in molecular biology. The same improvement that led to the quadratic complexity here can also be applied to algorithm AI when linear gap penalties are used. Alignment Quiz: 2. Dynamic Programming in C Comments are very welcome. luz@molgen.mpg.de 