Project Details
Improved protein quantification with mathematical optimization techniques using bipartite peptide-protein graph structures
Applicant
Professor Dr. Martin Eisenacher
Subject Area
Bioinformatics and Theoretical Biology
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 532401634
In bottom-up proteomics, proteins are digested to peptides and the peptides are then measured with mass spectrometry. Therefore, the received peptide quantities have to be summarized to protein quantities to obtain biological insights (protein quantification). This is complicated by the presence of not only unique, but also shared peptides that occur in multiple protein sequences. The relationship between peptides and corresponding proteins can be given as bipartite graphs. In this project we want to use the structures of these bipartite peptide-protein graphs to improve the protein quantification step. The proposed method builds an equation for each measured peptide and minimizes the error terms using mathematical optimization techniques. We plan to use the relative quantification, i.e., the calculated peptide ratios between two samples and get estimations of the corresponding protein ratios as a result. The focus is primarily on improving the quantification of proteins without unique peptides, which are neglected by many currently used protein quantification methods. Another focus is on the appropriate handling of missing values, which frequently occur in bottom-up proteomics and can have a huge impact on the calculated protein quantities. This is especially important for the detection of on/off proteins, that are present in one and absent in another experimental group. To perform statistical tests on the acquired protein ratios, we will develop a method for estimating the uncertainty and variance of the obtained protein ratios. The proposed algorithm will be validated on different gold standard data sets with known protein ratios. As these data sets do not cover all possible scenarios that may occur in real data, additionally it is planned to establish a method for simulating this kind of data. A comparison with other state-of-the-art methods for protein quantification will also be included. The proposed method will be implemented in a user-friendly software tool using R and RShiny that can be used by researchers with and without programming experience. All in all, we see great potential in the development of the proposed quantification algorithm to aid, e.g., in its application the biomarker discovery in proteomics studies.
DFG Programme
Research Grants