previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Algorithms for the comparison of two sequences


Type II alignment

The Needleman-Wunsch algorithm was subsequently modified by several authors, especially Sankoff [San72] and Sellers [Sel74]. Two distinctions were made then. Algorithms were formulated for type II alignments instead of type I and for distance measures (looking for minimal alignments) instead of similarity measures (looking for maximal solutions). The problem of similarity and distance will be discussed later. First the algorithm for type II alignments is discussed.

In the definition of the type II alignment the index 0 is used to represent the gap character. This is not to be confused with an index of the point (0,0) which we used in the type I graph.
Definition:
An ordered set of index-pairs is a type II alignment if upon ignoring the 0's all integers between 1 and and between 1 and respectively occur in their natural order in the sequences (xi) and (yi). In formal notation that is




and





Definition:
Given the weight function on the residue pairs. The score s of a type II alignment is defined as


with


In the type II graph the vertices can again be seen as grid points (i,j), but (cf. Figure 2.1). The edges in this graph point from (i,j) to (i+1,j) (corresponding to a pair (i+1,0), i>0 in the alignment, i.e. pairing residue si+1 with a gap character), (i+1,j+1) (corresponding to the pair (i+1,j+1),i,j>0 in the alignment, i.e. pairing residue si+1 with residue tj+1) and (i,j+1) (corresponding to a pair (0,j+1),j>0 in the alignment, i.e. pairing residue tj+1 with a gap character). Unlike in the case of the type I alignment, a gap penalty is now attributed to single indels (horizontal or vertical arcs) only. No gap penalty function g(n) is allowed any more.

The remaining procedure is just as before. A path in a type II graph corresponds to a type II alignment, the scores for the edges are now just the W(i,j), i.e. a penalty g for a vertical or horizontal arc and w(si,tj) for a diagonal one matching two residues. If end-gaps should not be punished the horizontal arcs on the top margin and the vertical ones on the left margin receive weight 0. In the case of two sequences this correspondence between alignment and graph is quite obvious. In the chapter on multiple alignment the type II graph for more than two sequences will be treated formally which then also applies to the above as a special case.

The optimization algorithm can then be applied to the type II graph, resulting in
Algorithm A-II-s


Backtracking is done as in algorithm A-I. Algorithm A-II-s corresponds to the algorithm formulated by Sankoff [San72] and Sellers [Sel74].

For a deeper understanding of the dynamic programming algorithm, you may explore the procedure by means of the Alignment Applet and the other exercises:



Alignment Applet
Alignment Applet


Alignment Quiz: 1.
Alignment Quiz: 1.


Dynamic Programming in Java
Dynamic Programming in Java


This algorithm does not involve looking back along a whole row or column as it does not allow for gap penalty functions. The number of steps to execute it therefore is three times the number of matrix entries that are inspected. This gives the algorithm a time complexity O(n2), where n denotes the length of the longer of the two sequences.



Comments are very welcome.
luz@molgen.mpg.de