**Prepared and Revised by:** Shakuntala Baichoo

**Module Name:** Programming II (Data structures, algorithms)

**Contact hours** (to be used as a guide)**: **

Theory (40 hours), Practicals (50%), Total (50%)

**LEARNING OBJECTIVES **

The main purpose of this module is to introduce to the students the concepts relating to problem solving through the efficient use of algorithms and subsequent implementation of the algorithm in any language of choice that is suitable to the application area. More specifically, in this module students will:

1. Understand concepts of algorithm analysis

2. Be exposed to the basic searching and sorting algorithms

3. Understand the use of dynamic data structures such as linked lists, trees and graphs

4. Learn about sequence alignment algorithms

5. Be exposed to the different algorithms based on trees and graphs

**SPECIFIC OUTCOMES ADDRESSED **

This module builds on the Programming I module and examines the important data structures for genomic data processing and studies the fundamental techniques in algorithm design and analysis. Upon completion of this module students should be able to:

1. Perform asymptotic analysis of algorithms

2. Write programs to manipulate simple data structures such as arrays and lists

3. Understand the importance of advanced data structures such as red-black and 2-3-4 trees and graph

4. Understand the different algorithm design paradigms including dynamic programming and backtracking

5. Develop and use the pairwise sequence alignment algorithms

6. Understand the basic sorting and searching algorithms

7. Write programs dealing with graph algorithms

8. Write programs using Suffix-tree data structures

9. Use string matching algorithms for specific bioinformatics problems

**BACKGROUND KNOWLEDGE REQUIRED**

**H3ABioNet bioinformatics modules as pre-requisites:** Programming I

**Additional:** very proficient in at least one programming language.

**BOOKS & OTHER SOURCES USED**

1. Sung W.K., Algorithms in Bioinformatics – A practical introduction (2010), ISBN: 978-1-4200-7033-0

2. Jones N. C., Pevzner P. A., An Introduction to Bioinformatics Algorithms (Computational Molecular Biology) (2000), ISBN-13: 978-0262101066

**COURSE CONTENT **

**A) Theory lectures** (hour allocations include practical time)

1. Overview of data Structures and algorithm design paradigms (3 hour(s))

2. Algorithm complexity analysis (Using Big-Oh notation) and recurrence relations (3 hour(s))

3. Multi-dimensional arrays and Sorting Algorithms including MergeSort and Insertion Sort (8 hour(s))

4. Dynamic programming and pairwise sequence alignment algorithms (8 hour(s))

5. Binary Search Trees and Red-black trees (5 hour(s))

6. Suffix trees and its application to Bioinformatics (5 hour(s))

7. Graph data structures and Graph algorithms and their application to sequence assembly (5 hour(s))

8. String searching algorithms and their application to the motif finding problem (3 hours)

**B) Practical component **

The goal of this module is to inculcate a solid basis for the data structures and algorithms required necessary in processing bioinformatics data. Throughout the course, students will be required to test the concepts through relevant computational problems using the Python programming language. This section “practical component” follows the same structure as the previous section “Theory lectures”: practicals just aim at having the students manipulate the concepts seen in the lectures, right after they were introduced to them.

**ASSESSMENT ACTIVITIES AND THEIR WEIGHTS**

Class Exercises: 5%

2 Written Tests: 30%

Practical Assignment: 15%*

Final Written Exam: 50%

*our advice is not to make each and every practical for marks, not to put too much counter-productive stress on the students. Practicals are privileged moments when students have the opportunity to understand the concepts as they put them into play.