2023 ScalingDeepLearningforMaterials
- (Merchant et al., 2023) ⇒ Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. (2023). “Scaling Deep Learning for Materials Discovery.” In: Nature. doi:10.1038/s41586-023-06735-9
Subject Headings: Graph Networks for Materials Exploration (GNoME), Material Stability Prediction, Inorganic Crystal Structure Database, Inorganic Crystal, Cathode Active Material.
Notes
- It presents a new deep learning approach called Graph Networks for Materials Exploration (GNoME) for efficiently discovering new stable inorganic crystal structures. The approach uses Graph Neural Networks (GNNs), specifically, message-passing neural networks that operate on graph representations of materials structures and compositions.
- It leverages graph neural networks trained on large datasets of DFT calculations to predict candidate materials' stability accurately. The models are continuously improved through iterative active learning as more data becomes available.
- It discovered over 2.2 million new stable crystal structures, expanding current materials databases by almost an order of magnitude. Many structures contain complex chemistry with 5+ elements.
- It found 381,000 of the discovered crystals are on the updated convex hull, meaning they are thermodynamically stable. 736 have already been experimentally realized.
- It produced a massive dataset that also enables training of accurate interatomic potentials for molecular dynamics simulations. This allows screening for properties like ionic conductivity with unprecedented computational efficiency.
- It demonstrates the power of scaled-up deep learning for significantly accelerating materials discovery and modelling capabilities. The authors have made the structures and software available to enable further research.
Cited By
Quotes
Abstract
Novel functional materials enable fundamental breakthroughs across technological applications from clean energy to information processing1,2,3,4,5,6,7,8,9,10,11. From microchips to batteries and photovoltaics, discovery of inorganic crystals has been bottlenecked by expensive trial-and-error approaches. Concurrently, deep-learning models for language, vision and biology have showcased emergent predictive capabilities with increasing data and computation12,13,14. Here we show that graph networks trained at scale can reach unprecedented levels of generalization, improving the efficiency of materials discovery by an order of magnitude. Building on 48,000 stable crystals identified in continuing studies15,16,17, improved efficiency enables the discovery of 2.2 million structures below the current convex hull, many of which escaped previous human chemical intuition. Our work represents an order-of-magnitude expansion in stable materials known to humanity. Stable discoveries that are on the final convex hull will be made available to screen for technological applications, as we demonstrate for layered materials and solid-electrolyte candidates. Of the stable structures, 736 have already been independently experimentally realized. The scale and diversity of hundreds of millions of first-principles calculations also unlock modelling capabilities for downstream applications, leading in particular to highly accurate and robust learned interatomic potentials that can be used in condensed-phase molecular-dynamics simulations and high-fidelity zero-shot prediction of ionic conductivity. Main
The discovery of energetically favourable inorganic crystals is of fundamental scientific and technological interest in solid-state chemistry. Experimental approaches over the decades have catalogued 20,000 computationally stable structures (out of a total of 200,000 entries) in the Inorganic Crystal Structure Database (ICSD)15,18. However, this strategy is impractical to scale owing to costs, throughput and synthesis complications19. Instead, computational approaches championed by the Materials Project (MP)16, the Open Quantum Materials Database (OQMD)17, AFLOWLIB20 and NOMAD21 have used first-principles calculations based on density functional theory (DFT) as approximations of physical energies. Combining ab initio calculations with simple substitutions has allowed researchers to improve to 48,000 computationally stable materials according to our own recalculations22,23,24 (see Methods). Although data-driven methods that aid in further materials discovery have been pursued, thus far, machine-learning techniques have been ineffective in estimating stability (decomposition energy) with respect to the convex hull of energies from competing phases25.
In this paper, we scale up machine learning for materials exploration through large-scale active learning, yielding the first models that accurately predict stability and, therefore, can guide materials discovery. Our approach relies on two pillars: first, we establish methods for generating diverse candidate structures, including new symmetry-aware partial substitutions (SAPS) and random structure search26. Second, we use state-of-the art graph neural networks (GNNs) that improve modelling of material properties given structure or composition. In a series of rounds, these graph networks for materials exploration (GNoME) are trained on available data and used to filter candidate structures. The energy of the filtered candidates is computed using DFT, both verifying model predictions and serving as a data flywheel to train more robust models on larger datasets in the next round of active learning.
Through this iterative procedure, GNoME models have discovered more than 2.2 million structures stable with respect to previous work, in particular agglomerated datasets encompassing computational and experimental structures15,16,17,27. Given that discovered materials compete for stability, the updated convex hull consists of 381,000 new entries for a total of 421,000 stable crystals, representing an-order-of-magnitude expansion from all previous discoveries. Consistent with observations in other domains of machine learning28, we observe that our neural networks predictions improve as a power law with the amount of data. Final GNoME models accurately predict energies to 11 meV atom−1 and improve the precision of stable predictions (hit rate) to above 80% with structure and 33% per 100 trials with composition only, compared with 1% in previous work17. Moreover, these networks develop emergent out-of-distribution generalization. For example, GNoME enables accurate predictions of structures with 5+ unique elements (despite omission from training), providing one of the first strategies to efficiently explore this chemical space. We validate findings by comparing predictions with experiments and higher-fidelity r2SCAN (ref. 29) computations.
Finally, we demonstrate that the dataset produced in GNoME discovery unlocks new modelling capabilities for downstream applications. The structures and relaxation trajectories present a large and diverse dataset to enable training of learned, equivariant interatomic potentials30,31 with unprecedented accuracy and zero-shot generalization. We demonstrate the promise of these potentials for materials property prediction through the estimation of ionic conductivity from molecular-dynamics simulations. Overview of generation and filtration
The space of possible materials is far too large to sample in an unbiased manner. Without a reliable model to cheaply approximate the energy of candidates, researchers guided searches by restricting generation with chemical intuition, accomplished by substituting similar ions or enumerating prototypes22. Although improving search efficiency17,27, this strategy fundamentally limited how diverse candidates could be. By guiding searches with neural networks, we are able to use diversified methods for generating candidates and perform a broader exploration of crystal space without sacrificing efficiency.
To generate and filter candidates, we use two frameworks, which are visualized in Fig. 1a. First, structural candidates are generated by modifications of available crystals. However, we strongly augment the set of substitutions by adjusting ionic substitution probabilities to give priority to discovery and use newly proposed symmetry aware partial substitutions (SAPS) to efficiently enable incomplete replacements32. This expansion results in more than 109 candidates over the course of active learning; the resulting structures are filtered by means of GNoME using volume-based test-time augmentation and uncertainty quantification through deep ensembles33. Finally, structures are clustered and polymorphs are ranked for evaluation with DFT (see Methods). In the second framework, compositional models predict stability without structural information. Inputs are reduced chemical formulas. Generation by means of oxidation-state balancing is often too strict (for example, neglecting Li15Si4). Using relaxed constraints (see Methods), we filter compositions using GNoME and initialize 100 random structures for evaluation through ab initio random structure searching (AIRSS)26. In both frameworks, models provide a prediction of energy and a threshold is chosen on the basis of the relative stability (decomposition energy) with respect to competing phases. Evaluation is performed through DFT computations in the Vienna Ab initio Simulation Package (VASP)34 and we measure both the number of stable materials discovered as well as the precision of predicted stable materials (hit rate) in comparison with the Materials Project16.
...
References
Sure, I will convert these references into the GM-RKB reference format:
- (Green et al., 2014) ⇒ M. A. Green, A. Ho-Baillie, and H. J. Snaith. (2014). “The Emergence of Perovskite Solar Cells.” In: Nature Photonics, 8, 506–514.
- (Mizushima et al., 1980) ⇒ K. Mizushima, P. Jones, P. Wiseman, and J. B. Goodenough. (1980). “LixCoO2 (0<x<-1): A New Cathode Material for Batteries of High Energy Density.” In: Materials Research Bulletin, 15, 783–789.
- (Bednorz & Müller, 1986) ⇒ J. G. Bednorz and K. A. Müller. (1986). “Possible High Tc Superconductivity in the Ba–La–Cu–O System.” In: Zeitschrift für Physik B Condensed Matter, 64, 189–193.
- (Ceder et al., 1998) ⇒ G. Ceder et al. (1998). “Identification of Cathode Materials for Lithium Batteries Guided by First-Principles Calculations.” In: Nature, 392, 694–696.
- (Tabor et al., 2018) ⇒ D. P. Tabor et al. (2018). “Accelerating the Discovery of Materials for Clean Energy in the Era of Smart Automation.” In: Nature Reviews Materials, 3, 5–20.
- (Liu et al., 2020) ⇒ C. Liu et al. (2020). “Two-Dimensional Materials for Next-Generation Computing Technologies.” In: Nature Nanotechnology, 15, 545–557.
- (Nørskov et al., 2009) ⇒ J. K. Nørskov, T. Bligaard, J. Rossmeisl, and C. H. Christensen. (2009). “Towards the Computational Design of Solid Catalysts.” In: Nature Chemistry, 1, 37–46.
- (Greeley et al., 2006) ⇒ J. Greeley, T. F. Jaramillo, J. Bonde, I. Chorkendorff, and J. K. Nørskov. (2006). “Computational High-Throughput Screening of Electrocatalytic Materials for Hydrogen Evolution.” In: Nature Materials, 5, 909–913.
- (Gómez-Bombarelli et al., 2016) ⇒ R. Gómez-Bombarelli et al. (2016). “Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach.” In: Nature Materials, 15, 1120–1127.
- (de Leon et al., 2021) ⇒ N. P. de Leon et al. (2021). “Materials Challenges and Opportunities for Quantum Computing Hardware.” In: Science, 372, eabb2823.
- (Wedig et al., 2016) ⇒ A. Wedig et al. (2016). “Nanoscale Cation Motion in TaOx, HfOx and TiOx Memristive Systems.” In: Nature Nanotechnology, 11, 67–74.
- (Brown et al., 2020) ⇒ T. Brown et al. (2020). “Language Models are Few-Shot Learners.” In: Advances in Neural Information Processing Systems, 33, 1877–1901.
- (Dosovitskiy et al., 2021) ⇒ A. Dosovitskiy et al. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” In: International Conference on Learning Representations (ICLR, 2021); https://openreview.net/forum?id=YicbFdNTTy
- (Jumper et al., 2021) ⇒ J. Jumper et al. (2021). “Highly Accurate Protein Structure Prediction with AlphaFold.” In: Nature, 596, 583–589.
- (Hellenbrandt, 2004) ⇒ M. Hellenbrandt. (2004). “The Inorganic Crystal Structure Database (ICSD)—Present and Future.” In: Crystallography Reviews, 10, 17–22.
- (Jain et al., 2013) ⇒ A. Jain et al. (2013). “Commentary: The Materials Project:
A Materials Genome Approach to Accelerating Materials Innovation.” In: APL Materials, 1, 011002.
- (Saal et al., 2013) ⇒ J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton. (2013). “Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD).” In: JOM, 65, 1501–1509.
- (Belsky et al., 2002) ⇒ A. Belsky, M. Hellenbrandt, V. L. Karen, and P. Luksch. (2002). “New Developments in the Inorganic Crystal Structure Database (ICSD): Accessibility in Support of Materials Research and Design.” In: Acta Crystallographica Section B Structural Science, 58, 364–369.
- (Aykol et al., 2021) ⇒ M. Aykol, J. H. Montoya, and J. Hummelshøj. (2021). “Rational Solid-State Synthesis Routes for Inorganic Materials.” In: Journal of the American Chemical Society, 143, 9244–9259.
- (Curtarolo et al., 2012) ⇒ S. Curtarolo et al. (2012). “AFLOWLIB.ORG: A Distributed Materials Properties Repository from High-Throughput ab Initio Calculations.” In: Computational Materials Science, 58, 227–235.
- (Draxl & Scheffler, 2019) ⇒ C. Draxl and M. Scheffler. (2019). “The NOMAD Laboratory: From Data Sharing to Artificial Intelligence.” In: Journal of Physics: Materials, 2, 036001.
- (Hautier et al., 2011) ⇒ G. Hautier, C. Fischer, V. Ehrlacher, A. Jain, and G. Ceder. (2011). “Data Mined Ionic Substitutions for the Discovery of New Compounds.” In: Inorganic Chemistry, 50, 656–663.
- (Ong et al., 2013) ⇒ S. P. Ong et al. (2013). “Python Materials Genomics (pymatgen): A Robust, Open-Source Python Library for Materials Analysis.” In: Computational Materials Science, 68, 314–319.
- (Aykol et al., 2019) ⇒ M. Aykol et al. (2019). “Network Analysis of Synthesizable Materials Discovery.” In: Nature Communications, 10, 2018.
- (Bartel et al., 2020) ⇒ C. J. Bartel et al. (2020). “A Critical Examination of Compound Stability Predictions from Machine-Learned Formation Energies.” In: npj Computational Materials, 6, 97.
- (Pickard & Needs, 2011) ⇒ C. J. Pickard and R. Needs. (2011). “Ab initio Random Structure Searching.” In: Journal of Physics: Condensed Matter, 23, 053201.
- (Wang et al., 2021) ⇒ H.-C. Wang, S. Botti, and M. A. Marques. (2021). “Predicting Stable Crystalline Compounds Using Chemical Similarity.” In: npj Computational Materials, 7, 12.
- (Hestness et al., 2017) ⇒ J. Hestness et al. (2017). “Deep Learning Scaling is Predictable, Empirically.” Preprint at https://arxiv.org/abs/1712.00409.
- (Furness et al., 2020) ⇒ J. W. Furness, A. D. Kaplan, J. Ning, J. P. Perdew, and J. Sun. (2020). “Accurate and Numerically Efficient r2SCAN Meta-Generalized Gradient Approximation.” In: Journal of Physical Chemistry Letters, 11, 8208–8215.
- (Batzner et al., 2022) ⇒ S. Batzner et al. (2022). “E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials.” In: Nature Communications, 13, 2453.
- (Thomas et al., 2018) ⇒ N. Thomas et al. (2018). “Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds.” Preprint at
...
...
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 ScalingDeepLearningforMaterials | Amil Merchant Simon Batzner Samuel S. Schoenholz Muratahan Aykol Gowoon Cheon Ekin Dogus Cubuk | Scaling Deep Learning for Materials Discovery | 10.1038/s41586-023-06735-9 | 2023 |