Using molecular simulation for adsorbent screening is computationally expensive and thus prohibitive to materials discovery. Machine learning (ML) algorithms trained on fundamental material properties can potentially provide quick and accurate methods for screening purposes. Prior efforts have focused on structural descriptors for use with ML. In this work, the use of chemical descriptors, in addition to structural descriptors, was introduced for adsorption analysis. Evaluation of structural and chemical descriptors coupled with various ML algorithms, including decision tree, Poisson regression, support vector machine and random forest, were carried out to predict methane uptake on hypothetical metal organic frameworks. To highlight their predictive capabilities, ML models were trained on 8% of a data set consisting of 130,398 MOFs and then tested on the remaining 92% to predict methane adsorption capacities. When structural and chemical descriptors were jointly used as ML input, the random forest model with 10-fold cross validation proved to be superior to the other ML approaches, with an R of 0.98 and a mean absolute percent error of about 7%. The training and prediction using the random forest algorithm for adsorption capacity estimation of all 130,398 MOFs took approximately 2 h on a single personal computer, several orders of magnitude faster than actual molecular simulations on high-performance computing clusters.
A recent report from the United Nations has warned about the excessive CO2 emissions and the necessity of making efforts to keep the increase in global temperature below 2 °C. Current CO2 capture technologies are inadequate for reaching that goal, and effective mitigation strategies must be pursued. In this work, we summarize trends in materials development for CO2 adsorption with focus on recent studies. We put adsorbent materials into four main groups: (I) carbon-based materials, (II) silica/alumina/zeolites, (III) porous crystalline solids, and (IV) metal oxides. Trends in computational investigations along with experimental findings are covered to find promising candidates in light of practical challenges imposed by process economics.
Superior performance in methane uptake capacity prediction by hypothetical metal organic frameworks has previously been accomplished using a novel combination of structural and chemical features with machine learning (ML) algorithms. This concept is extended for additional microcrystalline materials, focusing on 69 839 covalent organic frameworks (COFs) and 17 846 porous polymer networks (PPNs). For each material category, data was divided into train (80%) and test (20%) sets. Using the random forest (RF) algorithm, 10-fold cross-validation was carried out to evaluate the robustness of prediction for structural and chemical descriptors. Structural features included surface area, density, and void fraction. Chemical descriptors included the number and type of each atom, electronegativity, and degree of unsaturation among others. When chemical descriptors for adsorption at low pressures were included, significant improvements for predictions were observed compared to solely using structural descriptors. Specifically, adding chemical features increased the R 2 value from 0.66 to 0.87 for COFs and from 0.83 to 0.93 for PPNs. These results indicate that inclusion of chemical descriptors improves prediction across materials and pressures. While physisorption is the main driver for adsorption at these pressures, these results also imply contribution of surface chemical motifs on adsorption phenomena.
The increased use of transition fuels, such as natural gas, and the resulting increase in methane emissions have resulted in a need for novel methane storage materials. Metal−organic frameworks (MOFs) have shown promise as efficient storage materials. A virtually limitless number of potential MOFs can be hypothesized, which exhibit a wide variety of different structural and chemical characteristics. Because of the numerous possibilities, identification of the best MOF for methane storage can be a potentially challenging problem. In this work, determination of the best such MOF was cast as an inverse function problem. The function, a random forest (RF) model using 12 structural and chemical descriptors, was trained on 10% of a data set consisting of 130 398 hypothetical MOFs (hMOFs) to predict simulated methane uptake. The RF model was tested on the remaining 90% of the data. After validation, a genetic algorithm (GA) was used to evolve in silico the best MOFs for methane adsorption. The RF model was imbedded into the GA as the fitness function to predict the methane uptake of the evolved MOFs (eMOFs). The best 15 eMOFs matched hMOFs found in the top 1% of the database. Nine of the 15 eMOFs were found in the top 0.1%. More impressively, two of the eMOFs matched the top two hypothetical MOFs with the highest methane uptake values out of the entire database of 130 398 MOFs. Further, by leveraging the ensemble nature of the GA, it was possible to characterize the importance of the different material properties for methane adsorption, providing fundamental insight for future material design strategies.
Active learning is of great interest for many practical applications, especially in industry and the physical sciences, where there is a strong need to minimize the number of costly experiments necessary to train predictive models. However, there remain significant challenges for the adoption of active learning methods in many practical applications. One important challenge is that many methods assume a fixed model, where model hyperparameters are chosen a priori. In practice, it is rarely true that a good model will be known in advance. Existing methods for active learning with model selection typically depend on a medium-sized labeling budget. In this work, we focus on the case of having a very small labeling budget, on the order of a few dozen data points, and develop a simple and fast method for practical active learning with model selection. Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel. First we show empirically that our method is able to find hyperparameters that lead to the best performance compared to an oracle model on less separable, difficult to classify datasets, and reasonable performance on datasets that are more separable and easier to classify. Then, we demonstrate that it is possible to refine our model selection method using a weighted approach to trade-off between achieving optimal performance on datasets that are easy to classify, versus datasets that are difficult to classify, which can be tuned based on prior domain knowledge about the dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.