-
Natural sciences
- Inorganic chemistry
- Organic chemistry
- Theoretical and computational chemistry
- Other chemical sciences
Statistical-learning approaches are emerging as powerful alternatives to expensive computational
methods for solving the Schrödinger equation to determine molecular properties. Despite the
recent success of methods like neural networks, these models are only suitable for interpolation
and fail to scale to larger systems. That is, when a model is trained on small-to-medium-size
molecules, it can only be applied to systems of similar size. Modeling long-range intermolecular
interactions with machine learning (ML) requires sampling the vast diversity of chemical
environments occurring on an extended length scale. This leads to a combinatorial explosion in the
required amount of training data.
To circumvent these obstacles, I propose to incorporate our physical knowledge of long-range
interactions into the modeling process; this is philosophically different from the commonly used
“black box ML modeling”. A detailed analysis of the proposed model reveals that it can achieve the
accuracy of high-level quantum chemistry at the cost of molecular mechanics. This not only allows
one to compute interaction energies of large molecules (e.g. drug-target binding) and run longtime
molecular dynamics simulations of (macro)molecules, but also enables accurate and efficient
computational screening of large databases to select the most promising molecules for follow-up
experiments. Besides its transformative utility, this pioneering strategy is extendable to many
other problems in chemistry.