Project Details
Projekt Print View

Exploring Chemical Compound Space with Machine Learning

Subject Area Theoretical Computer Science
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term from 2014 to 2017
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 253375148
 
Final Report Year 2019

Final Report Abstract

The general objective of this project was enable rational exploration of chemical compound space (CCS) by developing efficient and accurate ML models. The first important step was to assess the capabilities and limitations of machine learning (ML) techniques for an accurate prediction of molecular energies in CCS. The ML models were trained on a large set of reference energies computed with hybrid density-functional theory (DFT) including van der Waals interactions, as well as the quantum-chemical “gold standard” CCSD(T) method that represents the best possible reference that is still computationally feasible. As originally planed, the project developed in four intertwined directions: Generation of a large set of reference molecular energies; Developing a ladder of physical models (descriptors) for organic molecules, from classical charge repulsion to approximate electronic models for use as input to the ML model; Application and analysis of efficient ML models (kernel-based learning such as Support Vector Machines and Gaussian processes, neural networks, etc.); Physical analysis (exploration) of the chemical compound space using optimal ML models. This will be done both from the point of view of computational complexity (dimensionality, sparsity etc.), and also in terms of the underlying chemistry (for example, one question is whether one can identify classes of molecules in CCS). Our work has led to a number of developments in the areas of data-driven representations of physical systems, advances in incorporating prior knowledge of the application domain, the development of a hierarchy of molecular and material descriptors, as well as a consolidated understanding of the demand on statistical models in atomistic simulations. A novel and challenging aspect was that we allowed variations both in chemical composition and in configurational degrees of freedom (bonding and geometry). This required the extensions of traditional ML models and novel scalable and physically meaningful representations, which required a joint effort between physics, chemistry, and computer science. In addition to modeling atomic interactions, we also aimed at fostering the understanding of ML based potentials with the development of interpretable models. Our analysis revealed that the most effective statistical inference methods are able to recover and exercise chemical concepts in a fully data-driven way.

Publications

  • (2014). How to represent crystal structures for machine learning: Towards fast prediction of electronic properties. Physical Review B, 89(20), 205118
    Schütt, K. T., Glawe, H., Brockherde, F., Sanna, A., Müller, K. R., & Gross, E. K. U.
    (See online at https://doi.org/10.1103/PhysRevB.89.205118)
  • (2015). Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space. The Journal of Physical Chemistry Letters, 6(12), 2326-2331
    Hansen, K., Biegler, F., Ramakrishnan, R., Pronobis, W., Von Lilienfeld, O. A., Müller, K. R., & Tkatchenko, A.
    (See online at https://doi.org/10.1021/acs.jpclett.5b00831)
  • (2017) "Machine Learning of Accurate Energy-conserving Molecular Force Fields". In: Science Advances, 3(5), e1603015
    Chmiela, S., Tkatchenko, A., Sauceda, H.E., Poltavsky, I., Schütt, K.T., Müller, K.-R.
    (See online at https://doi.org/10.1126/sciadv.1603015)
  • (2017) "Quantum-chemical insights from deep tensor neural networks". In: Nature Communications, 8, 13890
    Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R., Tkatchenko, A.
    (See online at https://doi.org/10.1038/ncomms13890)
  • (2017) "SchNet: A continuous-filter convolutional neural network for modeling quantum interactions.". In: Advances in Neural Information Processing Systems, 31, pages 991–1001
    Schütt, K.T., Kindermans, P.-J., Sauceda, H.E., Chmiela, S., Tkatchenko, A., Müller, K.-R.
  • (2017). Bypassing the Kohn-Sham equations with machine learning. Nature Communications, 8(1), 872
    Brockherde, F., Vogt, L., Li, L., Tuckerman, M. E., Burke, K., & Müller, K. R.
    (See online at https://doi.org/10.1038/s41467-017-00839-3)
  • (2018) "Towards Exact Molecular Dynamics Simulations with Machine-Learned Force Fields". In: Nature Communications, 9(1), 3887
    Chmiela, S., Sauceda, H. E., Müller, K.-R., Tkatchenko, A.
    (See online at https://doi.org/10.1038/s41467-018-06169-2)
  • (2018). Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning. The European Physical Journal B, 91(8), 178
    Pronobis, W., Schütt, K. T., Tkatchenko, A., Müller, K. R.
    (See online at https://doi.org/10.1140/epjb/e2018-90148-y)
  • "Molecular Force Fields with Gradient-Domain Machine Learning: Construction and Ap- plication to Dynamics of Small Molecules with Coupled Cluster Forces". In: The Journal of Chemical Physics, 150, 2019, 114102
    Sauceda, H.E., Chmiela, S., Poltavsky, I., Müller, K.-R., Tkatchenko, A.
    (See online at https://doi.org/10.1063/1.5078687)
  • (2019) "sGDML: Constructing Accurate and Data Efficient Molecular Force Fields Using Machine Learning". In: Computer Physics Communications
    Chmiela, S., Sauceda, Poltavsky, I., H. E., Müller, K.-R., Tkatchenko, A.
    (See online at https://doi.org/10.1016/j.cpc.2019.02.007)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung