Project Details
Elucidating Fingerprints – Towards a Holistic Explanatory Toolbox for Molecular Machine Learning
Subject Area
Organic Molecular Chemistry - Synthesis and Characterisation
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 497089464
The central point of this proposal is the development of out-of-the-box for interpretable and Explainable Molecular Machine Learning on a structural level. Within this project broadly utilized molecular representations will be developed, adapted and used to train highly robust but accurate models (e.g. Gradient Boost algorithms). Starting from these models an open-source software pipeline will be employed to map feature importance, influence, interdependencies, as well as model confidences back to the molecular structure giving trained chemists a plain handle for molecular and reaction design. An important part of this work will involve the development of visualization based on analytic results that provide a high degree of accuracy on the one hand and are easy to understand for any scientist working in the field of molecular science on the other hand. Those tools shall be usable to investigate and improve underlaying datasets as well as for molecular design. In addition to the coloration and visualization of individual molecules, methods of statistical evaluation regarding the general influence of functional groups should be developed, so that rules for further reaction design can be derived. Finally, these rules should be used in the laboratory to validate the explanatory methods developed within the course of this proposal. By these objectives the proposal aims on fulfilling the following of the PPs general goals: “Application of state-of-the-art ML algorithms – Explainable AI”, “Development of (domain specific) molecular representations – Generally improved molecular representations” and “Prediction, understanding and interpretation of molecular properties – Improvement of current applications”. Within this scope a high focus lies on the interpretation and explanation models for quantitative yield prediction to find handles for a systematic improvement within this underdeveloped area of MML which also has defined as a major topic of this PP.
DFG Programme
Priority Programmes