Project Details
Accurate Molecular Mechanics Force Fields through Data-driven Parameter Type Definitions
Applicant
Dr. Tobias Hüfner
Subject Area
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Biophysics
Computer-Aided Design of Materials and Simulation of Materials Behaviour from Atomic to Microscopic Scale
Biophysics
Computer-Aided Design of Materials and Simulation of Materials Behaviour from Atomic to Microscopic Scale
Term
from 2021 to 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 462118626
Molecular processes are complex and only a fraction of their details is discernable by experimental techniques. However, there are many applications in which it is of high interest to be able to predict the relevant molecular details. In this context, atomistic simulations have become increasingly important to probe the properties and interactions of (bio)molecules. Although these simulations can be theoretically sound, they are not necessarily accurate, and a key source of error is the underlying molecular mechanics force field, which relates a given molecular structure to atomic forces. Today, a major hurdle to improving force fields is the lack of rigor in the schemes used to cast atoms into categories for assignment of force field parameters. These categories, which are commonly termed parameter types, group similar chemical environments (i.e. substructures) and assign a common set of parameters to the atoms within these chemical environments. To avoid overfitting and facilitate parameter optimization, these types should be as few as possible while still enabling good agreement between computed and reference (experimental or high-level quantum chemistry calculation) molecular properties. However, parameter types have historically been assigned in a largely ad hoc manner. This prevents the rigorous optimization of force field parameters as new reference data becomes available and the straightforward introduction of new chemical substructures into existing force fields. Here, I propose a novel approach that overcomes the aforementioned obstacles through the combined data-driven optimization of force field parameter type definitions and force field parameter values. The approach is fundamentally different from existing force field optimization approaches that only tuned or added parameters to a given force field. In the proposed project, Bayesian inference and Monte Carlo sampling algorithms will be applied for the sampling of parameter type definitions in order to obtain force fields with high accuracy while at the same time having as few types as necessary (thus being as simple as possible). At any given step of the parameter sampling process, existing parameter types are either merged or split into new ones. Since the number of possible merging or splitting operations is vast, parameter types will be represented through quantum-level atomic features, thus enabling a computable physics-based description for a given chemical environment. The significance of the proposed work is its fundamentally data-driven and rigorous way to build force fields without the restriction to a particular functional form or application domain of the force field. Furthermore, the developed approach will make force fields easily extensible if new reference data becomes available- an important aspect in materials design and drug discovery. Finally, the impact of the research will be maximized by implementing the developed technology into an open source python package.
DFG Programme
WBP Position