In this work, we introduce MOFTransformer, a multi-model Transformer encoder pre-trained with 1 million hypothetical MOFs. The multi-modal model uses an integrated atom-based graph and energy-grid embeddings to capture both the local and global features of the MOFs, respectively. By ne-tuning the pre-trained model with small datasets (from 5,000 to 20,000), our model outperforms all other machine learning models across various properties that include gas adsorption, diffusion, electronic properties, and even text mined data. Beyond its universal transfer learning capabilities, MOFTransformer generates chemical insight by analyzing feature importance from attention scores within the self-attention layers. As such, this model can serve as a bedrock platform for other MOF researchers that seek to develop new machine learning models for their work.
Identifying optimal synthesis conditions for metal− organic frameworks (MOFs) is a major challenge that can serve as a bottleneck for new materials discovery and development. A trialand-error approach that relies on a chemist's intuition and knowledge has limitations in efficiency due to the large MOF synthesis space. To this end, 46,701 MOFs were data mined using our in-house developed code to extract their synthesis information from 28,565 MOF papers. The joint machine-learning/rule-based algorithm yields an average F1 score of 90.3% across different synthesis parameters (i.e., metal precursors, organic precursors, solvents, temperature, time, and composition). From this data set, a positive-unlabeled learning algorithm was developed to predict the synthesis of a given MOF material using synthesis conditions as inputs, and this algorithm successfully predicted successful synthesis in 83.1% of the synthesized data in the test set. Finally, our model correctly predicted three amorphous MOFs (with their representative experimental synthesis conditions) as having low synthesizability scores, while the counterpart crystalline MOFs showed high synthesizability scores. Our results show that big data extracted from the texts of MOF papers can be used to rationally predict synthesis conditions for these materials, which can accelerate the speed in which new MOFs are synthesized.
Porous materials have emerged as a promising solution for a wide range of energy and environmental applications. However, the asymmetric development in the field of MOFs has led to data imbalance when it comes to MOFs versus other porous materials such as COFs, PPNs, and zeolites. To address this issue, we introduce PMTransformer (Porous Material Transformer), a multi-modal pre-trained Transformer model pre-trained on a vast dataset of 1.9 million hypothetical porous materials, including metal-organic frameworks (MOFs), covalent-organic frameworks (COFs), porous polymer networks (PPNs), and zeolites. PMTransformer showcases remarkable transfer learning capabilities, resulting in state-of-the-art performance in predicting various porous material properties. To address the challenge of asymmetric data aggregation, we propose cross-material few-shot learning, which leverages the synergistic effect among different porous material classes to enhance fine-tuning performance with a limited number of examples. As a proof of concept, we demonstrate its effectiveness in predicting bandgap values of COFs using the available MOF data in the training set. Moreover, we established cross-material relationships in porous materials by predicting unseen properties of other classes of porous materials. Our approach presents a new pathway for understanding the underlying relationships between various classes of porous materials, paving the way toward a more comprehensive understanding and design of porous materials.
Metal-organic frameworks (MOFs) are a class of crystalline porous materials that exhibit a vast chemical space due to their tunable molecular building blocks with diverse topologies. Given that an unlimited number of MOFs can, in principle, be synthesized, constructing structure-property relationships through a machine learning approach allows for efficient exploration of this vast chemical space, resulting in identifying optimal candidates with desired properties. In this work, we introduce MOFTransformer, a multi-model Transformer encoder pre-trained with 1 million hypothetical MOFs. This multi-modal model utilizes integrated atom-based graph and energy-grid embeddings to capture both local and global features of MOFs, respectively. By fine-tuning the pre-trained model with small datasets ranging from 5,000 to 20,000 MOFs, our model achieves state-of-the-art results for predicting across various properties including gas adsorption, diffusion, electronic properties, and even text-mined data. Beyond its universal transfer learning capabilities, MOFTransformer generates chemical insights by analyzing feature importance through attention scores within the self-attention layers. As such, this model can serve as a bedrock platform for other MOF researchers that seek to develop new machine learning models for their work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.