Computational and mathematical approaches for statistical sequence alignment and phylogenetic inference on emerging parallel architectures

Applicants Professor Dr. Dirk Metzler; Professor Dr. Alexandros Stamatakis
Co-Applicant Professor Dr. Arndt von Haeseler
Subject Area Bioinformatics and Theoretical Biology
Term from 2011 to 2016
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 200966394
 

Project Description

Bioinformatics is currently facing two challenges. Firstly, significant advances in sequencing techniques (454, Solexa) are generating an unprecedented amount of molecular data. Hence, data acquisition is no longer a problem but rather data analysis, especially in molecular evolution. Secondly, the field of parallel computing is facing the multi-core revolution on general purpose CPUs and a plethora of novel accelerator technologies such as GPUs (Graphics Processing Units). Therefore, parallel computing is becoming feasible at the level of personal computers. Nevertheless, biological data stored in public databases (e.g., GenBank) increases at a significantly higher rate than computational power. Thus, we need to substantially improve the respective models, data structures, and algorithms for data analysis. Here, we propose to tackle these challenges for the two closely related and intertwined fields of Statistical Multiple Sequence Alignment (sMSA) and Phylogenetic Inference (PI) via an integrated approach. We will develop a highly optimized, portable, parallelized, and versatile library for sMSA and PI. We will also improve statistical models for sMSA, search heuristics for PI, and integrate them in a next-generation bioinformatics tool for evolutionary biology. The underlying idea is to develop models and methods in such a way that they will be scalable on all modern multi-core, accelerator, and supercomputer architectures.
DFG Programme Research Grants
International Connection Austria, Vietnam
Participating Person Dr. Le Sy Vinh