Project Details
dCortools: Distance Correlation Methods for Detecting Nonlinear Associations in High-Dimensional Molecular Data
Applicant
Dr. Dominic Edelmann
Subject Area
Medical Informatics and Medical Bioinformatics
Term
from 2019 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 417754611
Virtually all methods that are currently used for testing associations in high-dimensional molecular data can only detect linear or monotone associations. This concerns both tests for the association between different molecular variables (e.g. gene-gene-interactions) and tests for the association between molecular and clinical variables (e.g. gene-environment-interactions).However, it is known that many biological relations are more complex, including nonmonotone or even nonfunctional dependencies. Distance correlation is a novel dependence measure that can detect every kind of dependence between random vectors of arbitrary dimensions. Moreover, the distance correlation coefficient is very easy to compute, which predestines it for the application in statistical practice. In spite of these convincing properties, there are hitherto only few applications of the distance correlation coefficient on high-dimensional molecular data. This is due to missing methodology for biostatistical problems on the one hand and to a lack of application-oriented software on the other hand. The goal of this project is to close this gap. In the first part of the project, we plan to develop distance correlation methodology for biomedical applications. First, we aim to derive iterative variable selection procedures that are much more efficient than univariate procedures under the assumption of strong correlation structures, which are typically present in molecular data. Moreover, we propose to extend the distance correlation coefficient to survival data, which are particularly important in cancer research.For the second part of the project we plan to create a user friendly R package that combines distance correlation methods that are useful for biostatistics and hence allows the application of this methodology for the practitioner. The techniques developed in the first part of the project will be important components of this R package. Finally, we propose to apply the R package on a data set from the DACHS study, consisting of epigenome-wide methylation data, epidemiological and clinical data for more than 2000 patients with colorectal cancer.We are confident that the planned project will lead to a considerable increase of the use of distance correlation methodology in biostatistical practice. For molecular data, this will allow to detect complex associations that would be missed if linear procedures were used. This in turn may lead to a better understanding of biological processes.
DFG Programme
Research Grants