Lectures by the Computational Molecular Biology Department at the Max Planck Institute for Molecular Genetics

Statistics, Probability, and Algorithms in Bioinformatics (SPAß)

(Algorithmen für Stochastik und Stochastik für Algorithmen)

Lecture and Exercises

Vorlesung (Lecture) Übung (Exercises)
Nr. 19 706 19 706
SWS 2 2
Credits 3 3
Time Wed 14:00 Wed 15:30
Place MATH
025/026
Arnimallee 2-6
MATH
025/026
Arnimallee 2-6
Dozenten Sven Rahmann
Steffen Grossmann
Sven Rahmann
Steffen Grossmann
 

New + Important

11.02. In the week before the Klausur, an uncorrected version of the lecture notes sources and the resulting pdf is available.

03.12. Because of the ongoing strike, today's lecture will take place at the MPI for Molecular Genetics, Ihnestr. 73, 3rd floor in the small library room.

26.11. Because of the strike, the lecture could not take place today. The problem sets that were due today can be handed in next week. Also, be prepared for more and more difficult problems next week. You might also start reading about the accept-reject method for random number generation. (Even though there is a strike, there is no reason to learn less.)

19.11. About note-taking: I will say in class how to obtain the most recent version of the lecture notes and the source files. Please do not change any of those files or hand in modified versions. Instead, please send me your notes in a file notes_yourname.tex. It should contain a chapter and any new commands that you might need (but please use the provided commands as far as possible). Look at previous lecture notes for examples.

31.10. In problem 6 of the exercises, the file intronlengths.txt that you can download from the website, contains the lengths of individual introns, not the length distribution! So the mean intron length is approximately 882. Thanks to Jonas, and sorry about the confusion.

28.10. Lectures start each Wednesday at 14:00, not 14:15. They take place in Room 025/026 in the Pi-building, Arnimallee 2-6. The exercises start immediately after the lecture, at 15:30.

16.10. I put the first set of exercises online beforehand, so you know what to expect...

Summary

This lecture is about fun (Spaß): Statistics, Probability, and Algorithms in Bioinformatics. In other words, we'll be looking at the many situations where stochastics and algorithms meet. After this course, the successful participant will be well prepared with a toolbox of methods from both worlds and their applications to sequence analysis.

Date Topics Materials
22.10.Distributions and their representation. Parametric and non-parametric distributions. Discrete and continuous distributions. The IEEE floating point standard (you may use this document as a reference). Problem Set 1
printieee.c
29.10.Important parametric distributions and their pdf (pmf, ff), cdf, qf. Moments of distributions. These pages at MathWorld contain all the information you'll ever need to know (and more).Problem Set 2
Intron lengths
5.11.Computing with small probabilities. The lngamma function. See Numerical Recipes in C. Problem Set 3
12.11.Generation of uniform random numbers.Problem Set 4
19.11. Generation of random numbers with given distributions: Distribution-specific methods Problem Set 5
26.11.The lecture could not take place because of the strike
3.12. Accept-reject method for random number generation. Reminder on statistical testing. Location: MPI for Molecular Genetics, Ihnestr. 73, 3rd floor, small library. Problem Sets 6+7
10.12. Quality tests for random number generators. Problem Set 8
17.12.Sampling of combinatorial structures and objects with specified properties. Problem Set 9
7.1.Statistics of sequence alignment I (Steffen Grossmann). Problem Set 10
14.1.Statistics of sequence alignment II (Steffen Grossmann). Problem Set 11
21.1.MCMC methods. Problem Set 12
28.1.MCMC examples. Simulated annealing. Problem Set 13
4.2.Importance Sampling. Generating functions.
11.2.Review.
18.2.Final exam (Klausur).

Scheinkriterien

Vorlesung und Übung können nur zusammen belegt werden. Die beiden Teilnoten ergeben gemittelt die Gesamtnote.

Vorlesung (3 credits):
Ausarbeiten eines Skripts zu einer Vorlesungsstunde mit LaTeX (50% der Note); Bestehen der Klausur am Ende (50% der Note). Beide Noten müssen mindestens ausreichend (4.0) sein.
Übungen (3 credits):
Die Note richtet sich nach der erreichten Gesamtpunktzahl in den Übungsaufgaben. Zum Bestehen sind mindestens 50% zu erreichen. Ein großer Teil der Aufgaben wird aus Projekten bestehen. Eventuell lassen sich aus einzelnen Projekten Themen für Master-Arbeiten ableiten.

Literatur

  • William H. Press, Brian P. Flannery, Saul A. Teukolsky und William T. Vetterling
    Numerical Recipes in C, Second Edition
    Cambridge University Press
    A classic.
  • John A. Rice
    Mathematical Statistics and Data Analysis, Second Edition
    Duxbury Press
    Relatively expensive, but easy-to-read introductory statistics text.
  • W.N. Venables, D.M. Smith, and the R Development Core Team
    An introduction to R (Notes on R: A Programming Environment for Data Analysis and Graphics, Version 1.8.0, 2003-10-08)
    Official web page of the R project: http://www.r-project.org
    R is a free version of the commercial Splus software. Look here for further books on R.
  • Christian P. Robert und George Casella
    Monte Carlo Statistical Methods
    Springer
    Not very easy to read. Has a lot of material about random number generation and MCMC methods. Contains quite a number of misprints for a SPringer statistics text.
  • Ronald L. Graham, Donald E. Knuth und Oren Patashnik
    Concrete Mathematics
    Addison-Wesley
    Contains all the mathematics a programmer or algorithmician will ever need -- and more. Not very easy material, but extremely well written..
  • Robert Sedgewick und Philippe Flajolet
    Analysis of Algorithms
    Addison-Wesley
    Has many examples about how to use generating functions for the analysis of algorithms.

Weitere folgen. In der Vorlesung werde ich jeweils einzelne Themen und Beispiele aus den genannten Büchern herausgreifen.

Zur Vorlesung soll von den Teilnehmern ein Skript erstellt werden. Deswegen wird jeder Teilnehmer mindestens einmal eine Vorlesungsstunde als LaTeX Skript nachbereiten. Dies macht 50% der Vorlesungsnote aus.

Links

Homepage der Abteilung Computational Molecular Biology am Max-Plack-Institut für Molekulare Genetik.

Zum Studiengang Bioinformatik an der FU Berlin.

 

Anmerkungen und Fragen zu dieser Seite bitte an Sven Rahmann. Letzte Änderung: Wednesday, 11-Feb-2004 18:54:38 CET