Bayesian Optimization Reduces Data Requirements for Quantum Learning Models

Summary: Researchers developed Minimal Multilevel Machine Learning to optimize training data sizes for quantum machine learning models. Systematic Bayesian optimization of level ratios cut data needs by up to 25 times. The approach accelerates quantum chemistry computations, enabling greener materials discovery with reduced computational cost and carbon footprint.

Reducing Training Data Needs with Minimal Multilevel Machine Learning (M3L) (2023)

Corresponding Author: Anatole von Lilienfeld, Ph.D.

Title: Professor & Clark Chair of Advanced Materials @ the Vector Institute, University of Toronto

Academic Website: Click Here

Ideal Audiences for this Paper:

  1. Computational chemists interested in more efficient quantum machine learning models.
  2. Materials scientists looking to accelerate materials discovery and design.
  3. Quantum computing researchers seeking ways to reduce computational costs.
  4. Machine learning experts looking for multilevel learning applications.
  5. Environmental scientists aiming to develop greener computational methods.
  6. Battery researchers hoping to find new electrolyte materials faster.
  7. Pharmaceutical scientists trying to explore chemical space more efficiently.

Notable Details in the Paper:

  1. Multilevel machine learning (M2L) trains on differences between theoretical levels rather than properties directly
  2. M2L can reduce computational costs by using cheaper calculations to lift a target property onto a higher level of theory.
  3. Minimal multilevel machine learning (M3L) optimizes M2L by systematically finding the best ratios of training set sizes at each level.
  4. M3L achieved up to 25 times lower training data acquisition costs compared to conventional methods.
  5. Bayesian optimization reliably found optimal ratios, outperforming heuristic approaches.
  6. Learning curves, which model prediction improvement as a function of training set size, provided a quick initial estimate of optimal level ratios for the multilevel models.
  7. This intuitive application of learning theory helped guide the more computationally expensive Bayesian optimization.
  8. M3L was tested on various quantum chemistry datasets, including CCSD(T), MP2 and Hartree-Fock.
  9. For atomization energies of organic molecules, M3L reached chemical accuracy with 684 high-level training points, 13.8 times fewer than previously needed.
  10. M3L reduced computational carbon footprint, supporting greener computational science.

Key Methods:

  1. Bayesian optimization
  2. Kernel Ridge Regression (KRR)
  3. Multilevel Learning
  4. Learning Curves
  5. MBDF representations
  6. QM7b, QM9, EGP datasets

Reference

Heinen, S., Khan, D., von Rudorff, G. F., Karandashev, K., Arrieta, D. J. A., Price, A. J., … & von Lilienfeld, O. A. (2023). Reducing Training Data Needs with Minimal Multilevel Machine Learning (M3L). arXiv preprint arXiv:2308.11196.

Leave a comment