Project Details
Machine learning aided causal inference: Harnessing Multidimensional Omics Data to Improve Understanding of Complex Diseases
Applicant
Dr. Pascal Schlosser
Subject Area
Epidemiology and Medical Biometry/Statistics
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 530592017
The proteome and the metabolome are closely linked. Proteins are responsible for a wide range of biological functions, from catalyzing enzymatic reactions to enabling molecular transport. These processes often involve metabolites as intermediates, end-products, or substrates. Metabolites are central to energy generation as well as homeostasis. Their concentrations are often tightly regulated through generation, transport across membranes, as well as breakdown and excretion. Technical advances in the quantification of molecular traits such as proteins and metabolites have enabled the quantification of broad panels in large population and patient studies. Moreover, the establishment of modern biobanks that combine genetic and clinical information from electronic health records has facilitated phenome-wide genetic screens on an unprecedented scale. While such screens are suited to discover statistical associations, the integrated analysis of underlying, correlated molecular pathways or with additional, correlated health conditions is an open statistical challenge. Here, I propose to develop and apply machine learning-aided approaches to study the association patterns of molecular traits and health conditions in a data-driven network analysis, i.e., in an unsupervised manner. We will focus on clustering and identify associations between groups of intermediate molecular phenotypes and groups of related diseases. Algorithms will be designed to take advantage of strong genetic instruments for molecular traits and enable causal conclusions from the direction of molecular trait-to-disease. The developed methodology will be phenotype-independent, scalable, and will overcome limitations of current genetic methodology. Furthermore, we will extend the approach to new data types: a commonly ignored part of human DNA, the mitochondrial genome, and molecular intermediates resolved on the cell type rather than the target tissue level. The primary applications will be phenome-wide, followed by the detailed study of the connection between mitochondria and metabolites, and a focus on kidney function. This is motivated by the central role of mitochondria in the generation of energy required to remove waste from the blood and can showcase the extension of the approach to other organs and tissues. All algorithms will be implemented as easily accessible software packages to enable their widespread use by the scientific community. Overall, we will develop a concept to study the genetic basis of thousands of complex traits and diseases via molecular phenotypes in a hypothesis-free and unbiased manner, which may ultimately improve the selection of potential therapeutic targets and aid in the prioritization of experimental follow-up studies.
DFG Programme
Independent Junior Research Groups