Online Lectures on Bioinformatics

Algorithms for the comparison of two sequencesMinimum distance alignments
Based on a similarity function on residue pairs
the algorithms introduced above maximize alignment score.
Most algorithms, however,
were originally based on the opposite approach of using a minimal
distance criterion
(except for the one by Needleman and Wunsch [NeW70]).
This was prompted by the aim to define a metric on the set
of sequences, a problem put forward by S. Ulam [Ula72].
An optimal (type II) alignment
(x_{i},y_{i})_{i=1...N} of two sequences
in this context minimizes the sum over the distances
d(s_{xi},t_{yi}) (with neither element being a gap)
where a gap penalty is
added when a residue is paired with a gap character.
In fact the same algorithms that have been introduced above can be used
on distances and the two approaches are in a certain sense equivalent.
This is expressed by the following theorem due to T. Smith and
M. Waterman [SmW81]:
Proof: For a finite alphabet there is a finite number of distance or similarity values, say s_{i} and d_{i}. Let D_{i} be the number of pairs in the alignment of distance d_{i} and S_{j} the number of pairs of similarity s_{j}. Let further G_{n} be the number gaps of length n. Then the optimal scores s and d of the maximum similarity and minimum distance alignments respectively are: where the maximum and minimum are taken over all alignments. One can express the sum of the lengths of the two sequences (say N) from the number of pairs which have a certain quality and the number a gaps of a given length: Without loss of generality let s_{1} be max_{i}s_{i} Then it follows that This expression can be used to reformulate s: Therefore an alignment which maximizes alignment similarity will at the same time minimize the distance defined above according to the above relationships. q.e.d. Remark: In general d will not be a metric. Comments are very welcome. luz@molgen.mpg.de 