Online Lectures on Bioinformatics

Suboptimal AlignmentsApplication
The corollary to the above theorem ensures that we can highlight parts of an
optimal alignment by looking at the stable points.
The biological meaning however of stable points lies in the fact
that with rising
the percentage of stable points
that agree with the correct structural alignment of two sequences tends
(monotonically) towards 100%.
This does not allow for a mathematical proof but is the result of the
analysis of many example comparisons of different proteins where the three
dimensional structures are known and the quality of the calculated results can
be assessed objectively.
The data are summarized in the following table.
case 1 share a rather high sequence similarity (the biologically correct alignment contains 39 % identities) and the alignment is consequently very well defined. The two sequences in case 5 are very difficult to align and consequently the plot shows many alternative points. The data in the table indicate that the 10stable points in 5 out of 6 cases agree with the correct alignment. We conclude that those parts of an alignment should be trusted more than other points, namely the ones that allow for alternative 10suboptimal alignments. Additionally the table shows that with increasing there are more cases of comparisons for which misaligned pairs are eliminated from the set of stable points. A detailed discussion of these facts in their biological context is given in [ViA90]. The utility of this observation is limited by the fact that for larger contains very few points. In this case the increased security was gained at too high a cost to be helpful. Only if a reasonable number of residue pairs are established in , can the approach of highlighting reliably aligned regions be considered effective. Therefore a quantity ("extent") is introduced to measure the effectiveness of the stable regions. As sequences and with them their alignments can vary in length, counting is not a good measure. Instead we correlate this to the number of residue pairs in the correct structural alignment and define the extent as the number of stable points divided by the number of residue pairs in the structural alignment. Of course this is a biologically reasonable measure only when contains 100 % correct points. In the table these figures (in percent) are given in the last column of the table. The theoretical conclusion that the extent is decreasing with increasing security is confirmed by these data. The extent of the prediction also seems closely connected to what one intuitively refers to as the "difficulty" to correctly align a certain pair of sequences. For comparisons, however, where the optimal alignment has nothing in common with the biologically correct alignment these quanitities carry no meaning.
Comments are very welcome. luz@molgen.mpg.de 