Prepared by: Kais Ghedira, Alia Benkahla and Victor Jongeneel
Module Name: Genomics and Comparative Genomics
Contact hours (to be used as a guide): Total (40 hours), Theory (40%), Practical (60%)
SPECIFIC OUTCOMES ADDRESSED
On completion of this module, students should
1. Know the different steps and protocols used in a genome sequencing project.
2. Understand the organization of genomes, their components, and differences between the domains of life.
3. Understand how to generate physical and genetic maps of a genome.
4. Be familiar with homology searching tools.
5. Understand structural and functional annotation and the different tools that can be used for genome annotation.
6. Analyze micro array gene expression data using R packages, to identify a set of differentially expressed genes and to identify enriched GO terms, within this set, and pathways in which genes are involved.
7. Be able to predict regulatory elements in promoters of orthologs/co-expressed genes using computational tools and build a gene regulatory network.
8. Be familiar with genome/synteny browser and with Circos. Understand how to generate a dotplot, Venn diagram etc.
BACKGROUND KNOWLEDGE REQUIRED
H3ABioNet bioinformatics modules as pre-requisites: Molecular Biology for Programmers
Additional: Solid background in cell and molecular biology, Knowledge of molecular biology techniques, Operational knowledge of a scripting language (Perl, Python), Good command of the Unix environment to run programs locally.
BOOKS & OTHER SOURCES USED
1. EBI Genomics: An introduction to the EBI resources genomics-introduction-ebi-resources
2. EBI Comparative Genomics comparative-genomics-data-ensembl
3. EBI Comparative genomics exercises comparative-genomics-exercise
4. The textbook “Introduction to Genomics” by Arthur Lesk (reference at Amazon)
5. Original publications describing Genomics and Comparative Genomics
A) Theory lectures
1. Genome organization in all domains of life (Viruses, Bacteria, Archaea, Eukaryotes). This includes chromosomes, chromatin, histone proteins, protein-coding genes, introns and exons, operons, structural elements, repeated element families, transposons, pseudogenes, lincRNAs, various micro-RNAs, plasmids etc.
2. Generation of genetic and physical maps. This includes variant discovery and genotyping, determining recombination frequencies, linkage disequilibrium, BAC and fosmid libraries, tiling paths, STS mapping, FISH, radiation hybrids, etc.
3. Organization of a genome sequencing project: from genetic and physical maps to a full genome sequence.
4. Homology searching. Defining orthology, paralogy and synteny notions. How to use comparative genomics to build phylogenetic trees (gene/protein phylogeny vs. organism phylogeny). Multiple sequence alignments (MSA) and problem of generating biologically useful MSA between the genomes of multiple species.
5. Genome annotation: Structural annotation and Functional annotation. De novo and similarity based annotation.
6. Functional genome analysis. Introduction to gene expression technologies: ESTs, SAGE, MicroArray, RNASeq etc. Analysis of microarray gene expression data to identify differentially expressed genes using R packages (Limma, Affy, Anova…). Biological interpretation of micro-array or RNA-Seq data. GO Term enrichment of differentially expressed genes and identification of pathways in which genes are involved.
7. Gene regulation analysis. Transcription factor binding sites (TFBSs) prediction and regulatory motifs search in promoters of orthologous genes or co-expressed genes using phylogenetic footprinting, tools to detect over-represented motifs or Position Weight Matrices (PWMs).
8. Genome visualization: Dotplot, Venn diagram, Circos, Genome Browser & Synteny browser
B) Practical component
1. Genetic and physical maps generation. Use of computational tool for linkage disequilibrium analysis, variant discovery, STS mapping etc. (follows lecture 2)
2. Comparison of whole genomes. Use pairwise comparison and multiple alignments. Biological interpretation of results. Identification of putative homologs (orthologs and paralogs) and synteny verification. Introduction to phylogenetic trees generation between genes/proteins and between species. (follows lecture 4)
3. Genome annotation steps. Gene and gene structure prediction using cDNA, ESTs etc. (use of tools such as Artemis, RepeatMasker, GeneScan, Genefinder etc.). Protein annotation (Mapping to known genes, Protein domain prediction). (follows lecture 5)
4. Microarray gene expression analysis of a public Gene Expression Omnibus (GEO) dataset using R package. Identification of differentially expressed genes and functional analysis using GO term enrichment (GO-stats, GO::TermFinder, DAVID tool etc). (follows lecture 6)
5. Gene regulation analysis. Prediction of TFBSs in promoter region of orthologs target genes or co-expressed genes using tools like Footprinter (MicroFootprinter for prokaryotes), MEME to search for over-represented motifs in promoter sequences and TFBS prediction tools based on the use of PWMs (MATCH, F-MATCH). Gene regulatory network visualization using Cytoscape. (follows lecture 7)
6. Learn how to generate dotplot, VennDiagram and how to use Circos and Genome/Synteny browsers. Visualize similarities and differences of genome structure and positional relationships between genomic intervals. (follows lecture 8)
ASSESSMENT ACTIVITIES AND THEIR WEIGHTS
Two written exams (mid-term and final) on theory (40% weight)
Student reports on practicals (40% weight)
Final oral examination (20% weight)