previous section previous page next page next section

Online Lectures on Bioinformatics


Algorithms for the comparison of two sequences

Minimum distance alignments

Based on a similarity function on residue pairs the algorithms introduced above maximize alignment score. Most algorithms, however, were originally based on the opposite approach of using a minimal distance criterion (except for the one by Needleman and Wunsch [NeW70]). This was prompted by the aim to define a metric on the set of sequences, a problem put forward by S. Ulam [Ula72]. An optimal (type II) alignment (xi,yi)i=1...N of two sequences in this context minimizes the sum over the distances d(sxi,tyi) (with neither element being a gap) where a gap penalty is added when a residue is paired with a gap character. In fact the same algorithms that have been introduced above can be used on distances and the two approaches are in a certain sense equivalent. This is expressed by the following theorem due to T. Smith and M. Waterman [SmW81]:
Let {di}i be a (finite) set of distances on the pairs of residues and {si}i a (finite) set of similarities on the pairs, . A gap of length n is given a penalty g'(n) in the distance alignment and g(n) in the similarity alignment. Then every maximum similarity alignment coincides with a minimum distance alignment if

For a finite alphabet there is a finite number of distance or similarity values, say si and di. Let Di be the number of pairs in the alignment of distance di and Sj the number of pairs of similarity sj. Let further Gn be the number gaps of length n. Then the optimal scores s and d of the maximum similarity and minimum distance alignments respectively are:

where the maximum and minimum are taken over all alignments. One can express the sum of the lengths of the two sequences (say N) from the number of pairs which have a certain quality and the number a gaps of a given length:

Without loss of generality let s1 be maxisi Then it follows that

This expression can be used to reformulate s:

Therefore an alignment which maximizes alignment similarity will at the same time minimize the distance defined above according to the above relationships. q.e.d.
Remark: In general d will not be a metric.

Comments are very welcome.