previous section previous page next page next section

Online Lectures on Bioinformatics


Algorithms for SP-optimal multiple alignments

Divide-and-Conquer Alignment

The Divide-and-Conquer Alignment [Tö96] [St98] is a a fast heuristic algorithm for multiple sequence alignment which provides near-to-optimal results for sufficiently homologous sequences.

The main idea is first to cut the sequences several times at certain points to reduce the length of the sequences, second to align the cutted sequences and third to concatenate the multiple alignments:

The problem is to find the cut positions:

optimal cut positions, which produce an optimal alignment
  • exist: Just imagine you know the optimal alignment in advance.
  • cannot be found efficiently: optimal multiple alignment is NP-complete.

C-optimal cut positions minimize

\begin{displaymath}C(c_1,c_2,\ldots,c_n) = \sum_{l<k}C^{k,l}(c_k,c_l) \end{displaymath}

where C is the modified forward-backward-matrix:

Cijk,l = Sopt(s(k),s(l)) - (Dijk,l + Rijk,l)

The method needs time in $ \mathcal{O}(L^{k-1}) $and space in $ \mathcal{O}(L^2n^2) $. While time still grows exponentially in the number of sequences, the space is reduced to quadratic. There also exist heuristics which relax from the requirement that $ C(c_1,c_2,\ldots,c_n) $ is truly minimized which lead to an $ \mathcal{O}(L^2n^2) $ time and space algorithm.


Comments are very welcome.