Prepared by: Pandam Salifu
Module Name: Sequence Analysis
Contact hours (to be used as a guide):
Total (40 hrs), Theory (45%), Practicals (55%)
SPECIFIC OUTCOMES ADDRESSED
On completion of this module, students should be able to
1. Know and understand sequence data formats and retrieve data from various biological databases using their respective search engines.
2. Understand common sequence analysis algorithms – different versions of nucleotide and protein BLAST and web-based, freely available software.
3. Understand alignment and methods for analysis and interpretation of biomolecular sequences.
4. Identify remote homologues, orthologues, paralogues, conserved motifs, signatures, domains, promoters and regulatory elements.
5. Annotate small genomes and microbial genomes.
BACKGROUND KNOWLEDGE REQUIRED
H3ABioNet bioinformatics modules as pre-requisites: Introduction to Molecular Biology for Programmers (if applicable to the individual)
Additional: Detailed knowledge and understanding of central dogma, genetic code, nucleic acid structures and properties of amino acids.
BOOKS & OTHER SOURCES USED
1. Bioinformatics vol. 1: Data, sequence analysis & evolution; J. M. Keith; Humanna Press, 2008
2. Bioinformatics: A practical guide to the analysis of genes and proteins, ed. by Andreas D. Baxevanis and B. F. Francis Ouellette, Wiley Publication, 2004
3. Bioinformatics, a practical approach by Shui Qing Ye, Chapman & Hall Publication, 2008
4. EMBL-EBI online training : Using the primary nucleotide sequence resource
A) Theory lectures
1. Introduction to sequence analysis, file formats for biomolecular sequences – fasta, genbank,gcg, msf, etc – and conversions, concepts of sequence similarity, identity, homology
2. Scoring models for gap penalties, basic concept of a scoring matrix, PAM and BLOSUM series.
3. Homology searches using different versions of BLAST and FASTA, and interpretation of the results to derive biological significant relationships of the query sequences with the database sequences.
4. Pair-wise local and global alignments of DNA and protein sequences using Smith-Waterman and Needleman-Wunsch algorithms, and interpretation of results to deduce pattern, similarity, difference, homology etc.
5. Multiple sequence alignments of sets of sequences using different versions of CLUSTAL and others softwares. Interpretation of results to identify conserved and variable regions and correlate them with physio-chemical and structural properties.
6. Exploring and pattern searching for domains, motifs and signatures from biological databases; PROSITE, PRINTS, Pfams, SMART and Prodom.
7. Computational prediction of cis-regulatory elements and transcription factor binding sites: pattern discovery, pattern matching for prokaryotes and eukaryotes.
8. Application of sequence analysis for genome assembly and annotation.
B) Practical component
1. Exploring the integrated database system at NCBI server and querying the pubmed and GenBank databases using the ENTREZ search engine, sequence formats, format converters and retrieval from databases. (follows lecture 1 )
2. Exploring and querying SWISSPROT, UnitProtKB and NCBI using different versions of BLAST and FASTA. (follows lecture 2 )
3. Exploring tools on ExPASY for pair-wise and multiple sequence alignment. (follows lecture 3 and 4)
4. Exploring protein families and domain databases such as SMART, Pfam, CATH, Prosite, InterPro etc. (follows lecture 5 )
5. Explore tools for predicting promoters, transcription start and termination sites. (follows lecture 6)
6. Whole-genome alignments and visualization tools (VISTA Browser, UCSC Genome Browser, Ensembl). (follows lecture 7 )
ASSESSMENT ACTIVITIES AND THEIR WEIGHTS
Student report on practicals (20% weight)
Mini- project on annotation and analysis of genomic data (40% weight)
Two written exams (mid-term and final) on theory (40% weight)