Project Details
Projekt Print View

Importance sampling of chemical compound space: Thermodynamic properties from high-throughput coarse-grained simulations

Subject Area Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term from 2016 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 285228850
 
Final Report Year 2022

Final Report Abstract

Coarse-grained (CG) modeling has been a workhorse of multiscale modeling for soft matter for two main reasons: a reductionist (“physicist-type”) approach and a lower computational load to sample conformational space. This project leverages a different reason: compound screening. Transferable CG models can significantly reduce the size of chemical space by introducing degeneracy—similar molecules map to the same CG representation. Our approach showed a systematic way to compute complex thermodynamic properties for many compounds—orders of magnitude more than previously achieved. The insertion of small organic solutes in single-component phospholipid membranes revolve around the notion of a potential of mean force (PMF), casting the free-energy profile across the interface. We generated PMFs for all one- and two-bead representations of the CG Martini force field. This analysis revealed (i) the general collection of PMF shapes; (ii) linear relations between important thermodynamic quantities, especially water/octanol partitioning; (iii) a database of PMFs for 4 × 10^5 compounds. We extended our computational screening approach to passive permeability coefficients. Our large-scale analysis established for the first time a permeability surface, describing the change as a function of crucial physicochemical parameters. The established structure–property relationships links important functional groups to the target property, and even enables inverse molecular design by suggesting relevant chemical groups for a desired permeability coefficient. A scale-up of the chemical space covered led to the design of an importance sampling strategy, using a combination of both Monte Carlo simulations and machine learning. On a fundamental level, we used information theoretic tools to better understand what it meant to build top-down CG models that target chemical space as a whole. Thermodynamic accuracy can be reached with small numbers of bead types, and also enable a hierarchical screening strategy. Applications on membrane-specific and phase-altering compounds demonstrate the benefits of the method to derive clear design rules, and even suggest candidates for experimental testing. Unlike high-throughput calculations targeting electronic properties (e.g., from density functional theory; DFT), our high-throughput coarse-graining (HTCG) scheme requires low computational investment. While high-throughput DFT will first run all necessary calculations and later seek to coarse-grain the relevant information (e.g., from unsupervised learning), HTCG uses the underlying physics to first coarse-grain before running any simulation. This leads to lower computational load and, critically, a simplified structure–property relationship. In essence, the method takes advantage of both physical understanding of the problem, together with a data-centric approach to screening. HTCG complements recent efforts in the field of explainable/interpretable machine learning.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung