previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Suboptimal Alignments


Application

The corollary to the above theorem ensures that we can highlight parts of an optimal alignment by looking at the -stable points. The biological meaning however of -stable points lies in the fact that with rising the percentage of -stable points that agree with the correct structural alignment of two sequences tends (monotonically) towards 100%. This does not allow for a mathematical proof but is the result of the analysis of many example comparisons of different proteins where the three dimensional structures are known and the quality of the calculated results can be assessed objectively. The data are summarized in the following table.

case
number
comparison
of
%
homology
% of

correct
extent x 100
/ no. of alignable pairs
1 cytochrome c and
cytochrome c2
39 0
5
10
15
92
97
100
100
92
87
69
56
2 aplysia globin and
lamprey globin
33 0
5
10
15
84
95
100
100
84
77
73
54
3 Immunoglobulin
variable domain: heavy
and light chain
24 0
5
10
15
62
100
100
100
63
55
34
14
4 alpha hemoglobin
and leghemoglobin
16 0
5
10
15
87
97
100
100
87
64
41
18
5 cytochrome c and
cytochrome c551
25 0
5
10
15
49
95
100
100
40
25
6
3
6 copper binding
proteins: azurin
and plastocyanin
22 0
5
10
15
48
65
85
100
47
38
24
5

case 1 share a rather high sequence similarity (the biologically correct alignment contains 39 % identities) and the alignment is consequently very well defined. The two sequences in case 5 are very difficult to align and consequently the plot shows many alternative points.

The data in the table indicate that the 10-stable points in 5 out of 6 cases agree with the correct alignment. We conclude that those parts of an alignment should be trusted more than other points, namely the ones that allow for alternative 10-suboptimal alignments. Additionally the table shows that with increasing there are more cases of comparisons for which misaligned pairs are eliminated from the set of -stable points. A detailed discussion of these facts in their biological context is given in [ViA90].

The utility of this observation is limited by the fact that for larger contains very few points. In this case the increased security was gained at too high a cost to be helpful. Only if a reasonable number of residue pairs are established in , can the approach of highlighting reliably aligned regions be considered effective. Therefore a quantity ("extent") is introduced to measure the effectiveness of the stable regions. As sequences and with them their alignments can vary in length, counting is not a good measure. Instead we correlate this to the number of residue pairs in the correct structural alignment and define the extent as the number of -stable points divided by the number of residue pairs in the structural alignment. Of course this is a biologically reasonable measure only when contains 100 % correct points. In the table these figures (in percent) are given in the last column of the table. The theoretical conclusion that the extent is decreasing with increasing security is confirmed by these data. The extent of the prediction also seems closely connected to what one intuitively refers to as the "difficulty" to correctly align a certain pair of sequences. For comparisons, however, where the optimal alignment has nothing in common with the biologically correct alignment these quanitities carry no meaning.


Comments are very welcome.
luz@molgen.mpg.de