2021
DOI: 10.1021/acsestengg.1c00125
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of a Data-Driven, Machine Learning Approach for Identifying Potential Candidates for Environmental Catalysts: From Database Development to Prediction

Abstract: Data-driven, machine learning approaches are increasingly used for the discovery and development of catalytic materials in the area of material science and engineering. In this paper, the approach was evaluated with respect to its applicability in identifying potential environmental catalysts (ECs) using the selective catalytic reduction (SCR) of the air pollutant NO x as an example. The detailed procedures including database assemblage, the training and testing of a machine learning model, the validation and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 54 publications
1
11
0
Order By: Relevance
“…The number of hidden layers and neurons was adjusted to optimize the performance of the ANN model based on the values of the correlation coefficient R and root means square error (RMSE) (eqs and ), respectively. As demonstrated previously, a three hidden-layer structure with 6, 4, and 2 neurons for each layer (Figure S1 of the Supporting Information) was an optimal combination for this problem RMSE = 1 N ( P mesu P pred ) 2 N where P mesu is the measured value and P pred is the predicted value. N is the total amount of data R = 1 N false( P mesu mesu false) false( P pred pred false) 1 N ( P mesu mesu ) 2 ( P pred pred ) 2 where P mesu , P pred , P̅ mesu , and P̅ pred are the measured, predicted, average of measured, and average of predicted values, respectively.…”
Section: Iterative Approachmentioning
confidence: 71%
See 2 more Smart Citations
“…The number of hidden layers and neurons was adjusted to optimize the performance of the ANN model based on the values of the correlation coefficient R and root means square error (RMSE) (eqs and ), respectively. As demonstrated previously, a three hidden-layer structure with 6, 4, and 2 neurons for each layer (Figure S1 of the Supporting Information) was an optimal combination for this problem RMSE = 1 N ( P mesu P pred ) 2 N where P mesu is the measured value and P pred is the predicted value. N is the total amount of data R = 1 N false( P mesu mesu false) false( P pred pred false) 1 N ( P mesu mesu ) 2 ( P pred pred ) 2 where P mesu , P pred , P̅ mesu , and P̅ pred are the measured, predicted, average of measured, and average of predicted values, respectively.…”
Section: Iterative Approachmentioning
confidence: 71%
“…This is also expected because additional data would improve the prediction as shown in a previous study. 18 This study demonstrated an iterative approach of the MLbased model and experiment to identify new catalysts that typically have spare data in the literature. In the approach, the ML-based model was first trained from relevant data in the literature and was then used to screen candidate catalysts to guide the experimental step, and the experiment step is used to synthesize the candidate catalysts to evaluate the performance predicted from the ML model and to provide new data to retrain the ML model.…”
Section: Variablementioning
confidence: 99%
See 1 more Smart Citation
“…Also, the loading amount of Zr was found to play an important role due to the fact that the Cr 5+ species can reduce as the Zr loading amount increases, which can subsequently lower the NO x conversion efficiency. 88 In addition, a ML algorithm along with 27 descriptors was applied to 2228 experimental data obtained from the literature 89 to predict the activity of heterogeneous catalysts, which reveals that temperature is the most important descriptor for the water-gas shi reaction. 90 Moreover, learning from a large database in nanoscience can be used for rapid design and discovery of new heterogeneous catalysts using ML.…”
Section: Integration Of ML With Experimentsmentioning
confidence: 99%
“…Synthesis-based methods, such as the meta-analyses or systematic reviews widely performed in the medical research community, minimize these challenges by evaluating entire bodies of research . Data synthesizability in water treatment research domains is often limited by the paucity of observations and inadequate data documentation, which hinders broader use of data for materials informatics research, uncertainty quantification efforts, and plant control schema. Strong synthesis research requires access to underlying study data, facile recombination with other interoperable study datasets, and holistic evaluation of the synthesized data. These methods help to elucidate the robustness of conclusions, pinpoint and manage discrepancies across findings, and identify key knowledge gaps warranting further inquiry …”
Section: Fair/o Data Principles and Their General Benefitsmentioning
confidence: 99%