previous section previous page next page next section

Online Lectures on Bioinformatics


Suboptimal Alignments


As pointed out in a previous section two proteins may share the same three-dimensional fold. In this case it is possible to superimpose the two protein structures in space and derive a list of corresponding amino acids from the two sequences. This will in general be a type I alignment as it is based on residue pairs. In our context this alignment is the standard of truth we want to approximate.

The practical application of the algorithms to this problem introduced above requires specification of the parameters involved. With a view also to later applications we will use algorithm A-I with a linear gap penalty function g(k) = a + b(k-2). The same improvement that lead to algorithm A-II-l is applicable such that we are dealing with an O(n2) algorithm. Gaps at the ends of the sequences will not be penalized. The weight function (in biological context referred to as scoring matrix) by Dayhoff [DBH83] that was discussed in section 2.1 has been widely accepted and will not be questioned in the current work. In fact there is no choice of algorithms and parameters known which, applied to any pair of sequences, would result in an optimal alignment that perfectly matches the structural one [RVA89].

It would therefore be of interest to distinguish correct parts of an alignment from those that disagree with the structural comparison. The technical tool to tackle this problem is the calculation and analysis of suboptimal solutions to the alignment problem. By a suboptimal solution we mean one that has a score very close to the optimal score. Studying whether and how close to the optimal solution suboptimals occur, gives an idea of the degree of relatedness between two sequences. Furthermore we can use it to delineate regions in an alignment which are more likely to coincide with the correct alignment than others. This idea is referred to as local reliability of an alignment. Suboptimal alignments are useful not only in the analysis of two-sequence comparisons but they will also be applied to derive heuristics for the simultaneous comparison of multiple sequences.

Comments are very welcome.