Suboptimal Alignments
Application
The corollary to the above theorem ensures that we can highlight parts of an
optimal alignment by looking at the -stable points.
The biological meaning however of -stable points lies in the fact
that with rising
the percentage of -stable points
that agree with the correct structural alignment of two sequences tends
(monotonically) towards 100%.
This does not allow for a mathematical proof but is the result of the
analysis of many example comparisons of different proteins where the three
dimensional structures are known and the quality of the calculated results can
be assessed objectively.
The data are summarized in the following table.
case number |
comparison of |
% homology |
 |
% of

correct |
extent x 100
/ no. of alignable pairs |
| 1 |
cytochrome c and cytochrome c2 |
39 |
0 5 10 15 |
92 97 100 100 |
92 87 69 56 |
| 2 |
aplysia globin and lamprey globin |
33 |
0 5 10 15 |
84 95 100 100 |
84 77 73 54 |
| 3 |
Immunoglobulin variable domain: heavy and light chain |
24 |
0 5 10 15 |
62 100 100 100 |
63 55 34 14 |
| 4 |
alpha hemoglobin and leghemoglobin |
16 |
0 5 10 15 |
87 97 100 100 |
87 64 41 18 |
| 5 |
cytochrome c and cytochrome c551 |
25 |
0 5 10 15 |
49 95 100 100 |
40 25 6 3 |
| 6 |
copper binding proteins: azurin and plastocyanin |
22 |
0 5 10 15 |
48 65 85 100 |
47 38 24 5 |
case 1 share a rather high sequence similarity (the biologically correct
alignment contains 39 % identities)
and the alignment is consequently very well
defined. The two sequences in case 5 are very difficult to align and
consequently the plot shows many alternative points.
The data in the table indicate that the 10-stable points
in 5 out of 6 cases
agree with the correct alignment. We conclude that those parts of an
alignment should be trusted more than other points, namely the ones that
allow for alternative 10-suboptimal alignments.
Additionally the table shows that with increasing
there are
more cases of comparisons for which misaligned pairs are eliminated from the
set of -stable points. A detailed discussion of these facts
in their biological context is given in [ViA90].
The utility of this observation is limited by the fact that
for larger
contains very few points. In this case the increased
security
was gained at too high a cost to be helpful. Only if a reasonable number
of residue pairs are established in
,
can the approach of
highlighting reliably aligned regions be considered effective. Therefore
a quantity ("extent") is introduced to measure the effectiveness
of the stable regions. As sequences and
with them their alignments can vary in length,
counting
is not a good measure. Instead we correlate this to the number of residue pairs
in the correct structural alignment and define the extent
as the number of -stable points divided by
the number of residue pairs in the structural alignment.
Of course this is a biologically reasonable measure only when
contains 100 % correct points.
In the table these figures (in percent) are given
in the last column of the table. The theoretical
conclusion that the extent is decreasing with increasing security is confirmed
by these data.
The extent of the prediction also seems closely connected
to what one intuitively
refers to as the "difficulty" to correctly align a certain pair of sequences.
For comparisons, however, where the optimal alignment has nothing
in common with the biologically correct alignment these quanitities carry
no meaning.
Comments are very welcome.
luz@molgen.mpg.de
|