Development of Solubility Prediction Models with Ensemble Learning

Hu, Pingfan; Jiao, Zeren; Zhang, Zhuoran; Wang, Qingsheng

doi:10.1021/acs.iecr.1c02142

Cited by 19 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of MOF descriptors in the data set was large compared to the data set size. Therefore, a reduction in the number of descriptors is needed, since the accuracy of the model in machine learning depends on the quality of the domain of descriptors data. , As depicted in Figure , descriptor screening and statistical significance determination are crucial to ensure that the model includes only descriptors highly correlated to the target property. It is standard to have 10–15% ratio of descriptors per total number of molecules in the data set.…”

Section: Methodsmentioning

confidence: 99%

Thermal Stability of Metal–Organic Frameworks (MOFs): Concept, Determination, and Model Prediction Using Computational Chemistry and Machine Learning

Escobar-Hernandez

Pérez

et al. 2022

Ind. Eng. Chem. Res.

Self Cite

View full text Add to dashboard Cite

The indubitable rise of metal−organic framework (MOF) technology has opened the potential for commercialization as alternative materials with a versatile number of applications that range from catalysis to greenhouse gas capture. However, there are several factors that constrain the direct scale-up of MOFs from laboratory to industrial plant given the insufficient knowledge about the overall safety in synthesis processes. This article focuses on the study of MOF thermal stability, from concept to prediction, and the factors that influence such stability. The core of this work is a thermal stability prediction model for MOFs. This model can be applied to existing and new MOF structures, and it will allow for an estimation of the thermal stability temperature range of MOFs. This work contributes to the overall advancement of MOF technology and the efforts for its commercial use at industrial scale, combining both experimental data and computational techniques.

show abstract

Section: Methodsmentioning

confidence: 99%

Thermal Stability of Metal–Organic Frameworks (MOFs): Concept, Determination, and Model Prediction Using Computational Chemistry and Machine Learning

Escobar-Hernandez

Pérez

et al. 2022

Ind. Eng. Chem. Res.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The child decision nodes will keep repeating the splitting process until stopping criteria are met (e.g., DT exceeding a maximum depth). A single DT is highly interpretable but prone to overfitting . Therefore, ensemble tree methods like random forest (RF) and gradient boosting (GB) using a technique called “sampling with replacement” were developed to prevent overfitting.…”

Section: Methodsmentioning

confidence: 99%

“…A single DT is highly interpretable but prone to overfitting. 38 Therefore, ensemble tree methods like random forest (RF) 39 and gradient boosting (GB) 40 using a technique called "sampling with replacement" were developed to prevent overfitting. Instead of establishing a single DT, RF constructs a number of DTs (the "forest") with randomly selected features to minimize feature correlation.…”

Section: Machine Learning Algorithmsmentioning

confidence: 99%

Accelerated Design of Flame Retardant Polymeric Nanocomposites via Machine Learning Prediction

Zhang

Jiao

Shen

et al. 2022

ACS Appl. Eng. Mater.

Self Cite

View full text Add to dashboard Cite

Improving the flame retardancy of polymeric materials used in engineering applications is an increasingly important strategy for limiting fire hazards. However, the wide variety of flame retardant polymeric nanocomposite compositions prevents quick identification of the optimal design for a specific application. In this study, we built a flame retardancy database of more than 800 polymeric nanocomposites, including information from polymer flammability, thermal stability, and nanofiller properties. Then, we applied five machine learning algorithms to predict the flame retardancy index for different types of flame retardant polymeric nanocomposites. Among them, extreme gradient boosting regression gives the best prediction with a coefficient of determination (R 2) of 0.94 and a root-mean-square error of 0.17. In addition, we studied how the physical features of polymeric nanocomposites affected flame retardancy using the correlation matrix and feature importance plot, which in turn was used to guide the design of polymeric nanocomposites for flame retardant applications. Following the guidelines, a high-performance flame retardant polymeric nanocomposite was designed and synthesized, and the experimental FRI result was compared with the machine learning prediction (6% prediction error). This result demonstrated a fast identification of flame retardancy of polymeric nanocomposite without large-scale fire tests, which could accelerate the design of functional polymeric nanocomposites in the flame retardant field.

show abstract

“…This model has two components, which are the back propagation neural network cross (BPC) and support vector regression (SVR). Ensemble learning 15 is to build and combine multiple learners to accomplish learning tasks, which usually achieve significantly better generalization performance than a single learner, and it can improve the prediction ability and stability of the model. The bagging method of ensemble learning is to divide data sets and combine them into multiple training sets to improve the learning effect, 16 and the BPC model is based on this idea.…”

Section: Introductionmentioning

confidence: 99%

Biological Activity Predictions of Ligands Based on Hybrid Molecular Fingerprinting and Ensemble Learning

Zeng

Zhang

et al. 2023

ACS Omega

View full text Add to dashboard Cite

The biological activity predictions of ligands are an important research direction, which can improve the efficiency and success probability of drug screening. However, the traditional prediction method has the disadvantages of complex modeling and low screening efficiency. Machine learning is considered an important research direction to solve these traditional method problems in the near future. This paper proposes a machine learning model with high predictive accuracy and stable prediction ability, namely, the back propagation neural network cross-support vector regression model (BPCSVR). By comparing multiple molecular descriptors, MACCS fingerprint and ECFP6 fingerprint were selected as inputs, and the stable prediction ability of the model was improved by integrating multiple models and correcting similar samples. We used leave-one-out cross-validation on 3038 samples from six data sets. The coefficient of determination, root mean square error, and absolute error were used as the evaluation parameters. After comparing the multiclass models, the results show that the BPCSVR model has stable prediction ability in different data sets, and the prediction accuracy is higher than other comparison models.

show abstract

Development of Solubility Prediction Models with Ensemble Learning

Cited by 19 publications

References 27 publications

Thermal Stability of Metal–Organic Frameworks (MOFs): Concept, Determination, and Model Prediction Using Computational Chemistry and Machine Learning

Thermal Stability of Metal–Organic Frameworks (MOFs): Concept, Determination, and Model Prediction Using Computational Chemistry and Machine Learning

Accelerated Design of Flame Retardant Polymeric Nanocomposites via Machine Learning Prediction

Biological Activity Predictions of Ligands Based on Hybrid Molecular Fingerprinting and Ensemble Learning

Contact Info

Product

Resources

About