previous section previous page next page next section

Online Lectures on Bioinformatics


Variants of the dynamic programming algorithm

Free end-gaps

Depending on the biological setting several kinds of alignments are in use. When the sequences are expected to share similarity extending from the beginning of the sequences to their ends they are aligned globally. This means that each residue of either sequence is part either of a residue pair or a gap. In particular it implies that gaps at the ends are charged like any other gap. This, however, is a particularly unrealistic feature of a global alignment. While sequences may very well share similarity over their entire length (see the example dot plot of two hemoglobin chains in the section on Pairwise sequence comparison), their respective N- and C-termini usually are difficult to match up and differences in length at the ends are more of a rule than an exception. Consequently, one prefers to leave gaps at the ends of the sequences un-penalized. This variant is easy to implement in the dynamic programming algorithm. Two modifications are required. Firstly the initialization of the matrix needs to reflect the gap cost of 0 in the margin of the matrix. Secondly, upon backtracking, one does not necessarily start in the corner of the matrix but much rather searches the margins for the maximum from which to start. Variants of this that penalize only particular end-gaps are easy to derive and can be used, e.g. to fit one sequence into another or to overlap the end of one sequence with the start of another.

Alignment Quiz: 3.
Alignment Quiz: 3.


Figure: Backtracking when end gaps are not penalized.

Fitting a short sequence (a domain) into a long one (a protein)
Test on overlap (Shotgun Sequencing)

Comments are very welcome.