Project Details
Projekt Print View

Precise Methods for Orthology Assessment in Large Data Sets Using Best Matches

Subject Area Bioinformatics and Theoretical Biology
Mathematics
Term since 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 432974470
 
Orthology detection is an important task for genome annotation, gene nomenclature, and the understanding of gene evolution. With the rapidly accelerating pace at which new genomes become available, highly efficient methods are urgently required. As demonstrated in a large body of literature, reciprocal best match methods are reasonably accurate and scale to large data sets. Nevertheless, they are far from perfect and prone to both false positive and false negative orthology calls. Drawing upon recent advances in phylogenetic combinatorics we propose here to develop practical methods to compute from reciprocal best hits (as scored by sequence (dis)similarity) the reciprocally most closely related sequences, i.e., the best matches in the proper evolutionary sense. These are directly related to orthology. The goal of the proposed work is to develop a softwarelibrary that implements this kind of data correction not only for the case of duplication-loss scenarios but also in the presence of horizontal gene transfer. To this end we will again make use of recent advances in the mathematical understanding of the orthology, reconciliation maps between trees, and event labelings. Instead of focusing on an yet another orthology assessment tool, we will focus on implementing an open source software library that is intended to make it easy for the community to include thenew algorithms into their own pipelines and tools. As a showcase application we will develop a new backend for ProteinOrtho, an orthology assessment tool maintained by the Lechner Group in Marburg.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung