Algorithms for the comparison of two sequences
Minimum distance alignments
Based on a similarity function on residue pairs
the algorithms introduced above maximize alignment score.
Most algorithms, however,
were originally based on the opposite approach of using a minimal
(except for the one by Needleman and Wunsch [NeW70]).
This was prompted by the aim to define a metric on the set
of sequences, a problem put forward by S. Ulam [Ula72].
An optimal (type II) alignment
(xi,yi)i=1...N of two sequences
in this context minimizes the sum over the distances
d(sxi,tyi) (with neither element being a gap)
where a gap penalty is
added when a residue is paired with a gap character.
In fact the same algorithms that have been introduced above can be used
on distances and the two approaches are in a certain sense equivalent.
This is expressed by the following theorem due to T. Smith and
M. Waterman [SmW81]:
For a finite alphabet there is a finite number of distance or similarity values, say si and di. Let Di be the number of pairs in the alignment of distance di and Sj the number of pairs of similarity sj. Let further Gn be the number gaps of length n. Then the optimal scores s and d of the maximum similarity and minimum distance alignments respectively are:
where the maximum and minimum are taken over all alignments. One can express the sum of the lengths of the two sequences (say N) from the number of pairs which have a certain quality and the number a gaps of a given length:
Without loss of generality let s1 be maxisi Then it follows that
This expression can be used to reformulate s:
Therefore an alignment which maximizes alignment similarity will at the same time minimize the distance defined above according to the above relationships. q.e.d.
Remark: In general d will not be a metric.
Comments are very welcome.