Prepared by: Amel Ghouila
Module Name: Databases II
Contact hours (to be used as a guide): Total (40 hours), Theory (60%), Practical (40%)
1. Students will learn about of the limits of classical relational database architectures when dealing with complex and big biological data and the necessity of building scalable architectures.
2. Students will get introduced to the normalization issues.
3. Students will learn about Datawarehouses and datamarts concepts.
4. Students will learn how to use and query available Biomarts and build their own Biomarts and integrate various sources.
SPECIFIC OUTCOMES ADDRESSED
1- Students should be able to explain the limits of the classical relational database architecture.
2- Students should be able to build normalized databases and apply normalization principles to improve databases architecture.
3- Students should demontsrate a good understading of Datawarehouses and datamarts concepts and of what differenciate them from classical databases.
4- Students should be able to connect to publicly available Biomarts and extract information from them.
5- Students should be able to build their own Biomarts and integrate various sources.
BACKGROUND KNOWLEDGE REQUIRED
BOOKS & OTHER SOURCES USED
1. Data Mining concepts and Techniques. Southeast Asia Edition: Concepts and Techniques Jiawei Han and Micheline Kamber.
2. Development of a Database Course for Bioinformatics. Procedia Computer Science. Chena, Z. (2012), Pages 532–539. Proceedings of the International Conference on Computational Science, ICCS 2012.
3. BioMart 0.7 Documentation.
A) Theory lectures
1. Introduction: reminder of relational databases schema and its limits.
a. Limits of relational databases when handling complex data types
b. Normalization issues
2. Challenges faced in the biological data management.
a. various data types
b. various databases and structures
c. big data integration
d. How to build scalable and extensible database architectures
e. Web based resources
a. Basic concepts and why do we need datawharehouses
b. Data cleaning, integration and data transformation
c. Datawarehouses implementation : different steps for the design and construction
d. Data refreshing and maintaining
4. OLAP and BIOLAP servers.
5. Datamarts and BioMarts.
a. Basic concepts
b. Difference between datawharehouses and datamarts
c. Agnostic modelling
d. Principles of data federation and retrieving from various sources
6. Building marts from your own data.
a. Using MySQL for DM and datawarehouses
b. Building your own marts with Biomart (MartBuilder)
c. Adjusting marts schema
d. Mart Query Language (MQL)
7. Other examples of third party softwares
a. The Generic Model Organism Database (GMOD)
B) Practical component
1. Exercises on optimizing databases architectures. (follows lecture 2)
2. Exploring examples of public databases and databases with Biomarts interfaces, examples: Ensemble viewMart, HapMap martview, etc.) and studying their important features. (follows lecture 5)
3. Using biomaRt package to retreive data from various databases. (follows lecture 6)
4. Learn how to set databases and make them scalable. (follows lecture 6)
5. Using Biomart tools. (follows lecture 6)
6. MartBuilder Configuring and use for building your own marts with federation with other data coming from various sources. (follows lecture 6)
7. Querying Marts with MQL language. (follows lecture 6)
ASSESSMENT ACTIVITIES AND THEIR WEIGHTS
End of semester examination (50% weight)
Small projects with 2 or 3 students working on each (50% weight)
Project topic examples:
1. Use of biomaRt package to respond to different types of questions and retrieve data from various sources.
2. Building Biomarts using MartBuilder tool: (integrating different data sources and querying the mart to respond to different types of questions).