Project Details
Scalable Graph-learning with False Discovery Rate Control
Applicant
Professor Dr.-Ing. Michael Muma
Subject Area
Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 550090872
Graphical models have been widely applied across scientific fields thanks to their capability to capture intricate relationships among variables. Driven by technological progress and the demand to understand increasingly complex data structures, the number of variables has surged in numerous domains. This is particularly evident when inferring biological networks. Therefore, together with domain experts, we address this research problem as a practical use case. In the considered scenarios, the sample size may be in the same range, or even smaller than the number of variables (e.g., when studying rare diseases). Consequently, the potential edges within a graphical model by far outnumber the available samples. Thus, structural assumptions like sparsity become imperative for constructing meaningful, interpretable, and reliable graphical models. Determining the appropriate level of sparsity, especially for high-dimensional data, necessitates a tradeoff between false positive and false negative edges. Our scientific approach is to maximize the percentage of discovered true positive edges while controlling the false discovery rate (FDR) on an acceptable target level. Summarizing, our work is focused on deriving new FDR-controlling methods for graphical models that scale to high-dimensional data, and to apply them to practical biomedical use-cases to demonstrate the relevance of our methodological work. To date, research addressing provable FDR-control in high-dimensional graphical models remains very limited. A promising preliminary work has been recently developed in our group. Within this research project, we shall build upon our Terminating-Random Experiments (T-Rex) framework, to develop new FDR-controlling graph estimation methods that i) scale in terms of computational complexity to clinically relevant high-dimensional settings; ii) capture different types of structural dependencies among variables (e.g., hierarchical, grouped); iii) apply to heavy-tailed data distributions (e.g., elliptical distributions); iv) do not require manual parameter tuning (e.g., sparsity parameter) but instead optimally self-calibrate to maximize the true positive rate (TPR) while concurrently guaranteeing FDR-control. The proposed framework can also be extended to provide other finite-sample statistical guarantees on the errors (e.g., the probability of falsely connecting disjoint components of a graph). The work is organized into five work packages (WPs). Within WP 1, we shall develop FDR-controlling pseudo-likelihood methods, while the focus of WP 2 lies on score matching approaches. WP 3 is concerned with structural variable selection. In cooperation with domain experts, WP 4 applies the developed methods to determine reproducible conditional independence graphs for high-dimensional biological networks. Finally, WP 5 provides well-documented open source software packages and visualisation tools to increase the impact of our research.
DFG Programme
Research Grants
International Connection
China (Hong Kong), Finland, France, United Kingdom
Co-Investigators
Dr. Maik Pietzner; Professor Dr. Philipp Wild