Project Details
Exploring Chemical Compound Space with Machine Learning
Subject Area
Theoretical Computer Science
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term
from 2014 to 2017
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 253375148
The accurate prediction of molecular properties in the chemical compound space (CCS) is a crucial ingredient toward rational compound design in chemical and pharmaceutical industries. Therefore, one of the major challenges is to be enable quantitative calculations of molecular properties in CCS at moderate computational cost (milliseconds per molecule or faster). However, currently only high level quantum-chemical calculations, which can take up to several days per molecule, yield the desired 'chemical accuracy' (1~kcal/mol) required for predictive \textit{in silico} rational molecular design.Machine learning (ML) methods have been successfully used to map the problem of solving complex physical differential equations to statistical models. In this project, we will assess the capability of efficient ML methods when applied to the prediction of different molecular properties obtained with quantum chemistry calculations. The main focus will be on predicting molecular energies, however the same ideas can be employed at a later stage to predict excited state properties, such as polarizability, ionization potential or electron affinity.Our final aim is to enable predictions of molecular energies close to 'chemical accuracy' at a small fraction of cost of electronic structure calculations. Achieving this goal will allow us to rationally explore and analyze the structure and dimensionality of CCS.The expected results of this project are: (a) a physical analysis (exploration) of CCS using optimal ML models, with an outlook to identify important classes of molecules and understand the dimensionality of CCS. (b) A rigorous assessment of the feasibility (capabilities as well as limitations) of using ML techniques for the prediction of molecular properties, and finally (c) a dataset of molecular properties and excited state properties for a wide variety of molecules computed with different levels of theory.
DFG Programme
Research Grants