Prepared by: Jean-Baka Domelevo Entfellner
Module Name: Evolution and Phylogenetics
Contact hours (to be used as a guide): Total (40 hours), Theory (45%), Practical (55%)
SPECIFIC OUTCOMES ADDRESSED
On completion of this module, students should:
1. Understand the mathematical modeling of evolution of quantitative or qualitative biological traits (some ideas of Markovian processes, probability distribution of the time between two mutation events, between two speciation events, etc).
2. Be able to effectively build phylogenies from biological sequences, using software from the scientific community, and set their parameters accurately.
3. Be able to analyze and assess the quality of phylogenetic trees.
BACKGROUND KNOWLEDGE REQUIRED
H3ABioNet bioinformatics modules as pre-requisites: Sequence Analysis
Additional: Basic general-purpose scientific knowledge, basic arithmetic skills, and some familiarity with probabilities and statistics. Basic knowledge of the theory of evolution and of the inheritance of biological traits.
BOOKS & OTHER SOURCES USED
1. Inferring Phylogenies, by Joseph Felsenstein, published by Sinauer Associates, U.S.A., 2004. ISBN 0878931775, 978-0-878-93177-4
2. Statistical Methods in Molecular Evolution, by Rasmus Nielsen (Ed.) et al., Springer, 2005. ISBN 978-0-387-27733-2
3. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, by Durbin, Eddy, Krogh and Mitchison, Cambridge University Press, 1998. ISBN-10: 0521629713 ; ISBN-13: 978-0-521-62971-3
A) Theory lectures
1.Basic concepts of biological evolution. Reference to Darwin’s fundamental ideas. Concept of species (definition by Ernst Meyr), speciation and extinction events. (1 hour(s))
2.Evolutionary events on biological sequences: mutations, deletions and insertions. Several types of indels: one or several characters, possibly repeated (tandem repeats). (0.5 hour(s))
3. Two scales of study: “small” biological sequences (e.g. genes, RNAs, proteins, etc) or larger ones (e.g. full chromosome or full genomes). The evolutionary events differ according to the object of study:
– indels on “small” sequences: one or a few characters, possibly repeated (tandem repeats) → standard modeling with gap-open and gap-extend penalties during progressive alignment. Coding of the ‘-’ character on the leaves of a tree. (1 hour(s))
– rearrangements on “large” sequences: gene deletion, gene duplication, chromosomal inversion, horizontal transfers, translocations, etc. induce more complex models. (1 hour(s))
4. the Tree of Life: our current understanding of evolution linking all living species. (1 hour(s))
5. Naïve approaches to phylogeny reconstruction: majority rule, Maximum Parsimony.
6. Phylogenetic trees as generative models, likelihood of a tree as the probability to generate the observed data from it. Maximum Likelihood approaches. (2 hour(s))
7. The space of all tree topologies is huge: exploratory techniques (NNI, SPR, TBR). (0.5 hour(s))
8. The Bayesian viewpoint: what is the a priori probability of a model? (2 hour(s))
9. Introduction to Markovian processes as the standard of evolutionary models. Expected(not quite tackled in this course) time between two speciation events, expected time between two mutation events, matrices Q of instantaneous mutation rates, fundamental relation Pr(a → b | t) = [ e^(Qt) ]_(a,b)
10. Distance methods: distances between sequences from pairwise alignments. Distance matrices. NJ and UPGMA algorithms. (1 hour(s))
11. Reflecting on the duality between multiple sequence alignment and phylogenetic tree: MSA → sites → phylogeny based on the inferred evolutionary history on the sites, and phylogeny → guide tree → MSA obtained by progressive alignment methods. (1 hour(s))
12. Phylogenies on quantitative traits: the contrasts method. (1 hour(s))
B) Practical component
6 practical sessions:
We suggest the use of both online and standalone installed software tools, so that the students become acquainted with them all. We can cite PhyML or RaxML to build phylogenies, and also portals such as http://phylogeny.lirmm.fr/ .
This section “practical component” follows the same structure as the previous section “Theory lectures”: practicals aim at having the students manipulate the concepts seen in the lectures, right after they are introduced to them.
ASSESSMENT ACTIVITIES AND THEIR WEIGHTS
We would suggest two written exams during the course of the module (total weight = 50%) , and a final programming exam (weight = 50%). Of course, practicals assignments, administered throughout the module, can also count toward the module grade, but our advice is not to make each and every practical for marks, not to put too much counter-productive stress on the students. Practicals are privileged moments when students have the opportunity to understand the concepts as they put them into play.