# Suboptimal Alignments

## Application

The corollary to the above theorem ensures that we can highlight parts of an optimal alignment by looking at the -stable points. The biological meaning however of -stable points lies in the fact that with rising the percentage of -stable points that agree with the correct structural alignment of two sequences tends (monotonically) towards 100%. This does not allow for a mathematical proof but is the result of the analysis of many example comparisons of different proteins where the three dimensional structures are known and the quality of the calculated results can be assessed objectively. The data are summarized in the following table.

 casenumber comparisonof %homology % of correct extent x 100 / no. of alignable pairs 1 cytochrome c andcytochrome c2 39 051015 9297100100 92876956 2 aplysia globin andlamprey globin 33 051015 8495100100 84777354 3 Immunoglobulinvariable domain: heavyand light chain 24 051015 62100100100 63553414 4 alpha hemoglobinand leghemoglobin 16 051015 8797100100 87644118 5 cytochrome c andcytochrome c551 25 051015 4995100100 402563 6 copper bindingproteins: azurinand plastocyanin 22 051015 486585100 4738245

case 1 share a rather high sequence similarity (the biologically correct alignment contains 39 % identities) and the alignment is consequently very well defined. The two sequences in case 5 are very difficult to align and consequently the plot shows many alternative points.

The data in the table indicate that the 10-stable points in 5 out of 6 cases agree with the correct alignment. We conclude that those parts of an alignment should be trusted more than other points, namely the ones that allow for alternative 10-suboptimal alignments. Additionally the table shows that with increasing there are more cases of comparisons for which misaligned pairs are eliminated from the set of -stable points. A detailed discussion of these facts in their biological context is given in [ViA90].

The utility of this observation is limited by the fact that for larger contains very few points. In this case the increased security was gained at too high a cost to be helpful. Only if a reasonable number of residue pairs are established in , can the approach of highlighting reliably aligned regions be considered effective. Therefore a quantity ("extent") is introduced to measure the effectiveness of the stable regions. As sequences and with them their alignments can vary in length, counting is not a good measure. Instead we correlate this to the number of residue pairs in the correct structural alignment and define the extent as the number of -stable points divided by the number of residue pairs in the structural alignment. Of course this is a biologically reasonable measure only when contains 100 % correct points. In the table these figures (in percent) are given in the last column of the table. The theoretical conclusion that the extent is decreasing with increasing security is confirmed by these data. The extent of the prediction also seems closely connected to what one intuitively refers to as the "difficulty" to correctly align a certain pair of sequences. For comparisons, however, where the optimal alignment has nothing in common with the biologically correct alignment these quanitities carry no meaning.