2022
DOI: 10.1021/acs.est.2c04945
|View full text |Cite
|
Sign up to set email alerts
|

Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

Abstract: Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 64 publications
0
14
0
Order By: Relevance
“…After data cleaning, they accounted for the PCFs of 547 unique organic chemicals, which are significantly larger than the data sets used in previous studies (Figure S1). , , Most data from the industry have not been included in any public LCA databases and have resulted in a structurally more diverse training data set than previous works (Figure S2). In addition, chemicals in our new data set have more widely distributed physicochemical proprieties and PCFs than the chemicals that were previously used for modeling (Figures S1 and S3).…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…After data cleaning, they accounted for the PCFs of 547 unique organic chemicals, which are significantly larger than the data sets used in previous studies (Figure S1). , , Most data from the industry have not been included in any public LCA databases and have resulted in a structurally more diverse training data set than previous works (Figure S2). In addition, chemicals in our new data set have more widely distributed physicochemical proprieties and PCFs than the chemicals that were previously used for modeling (Figures S1 and S3).…”
Section: Resultsmentioning
confidence: 99%
“…ML models are usually considered “black boxes” due to their poor interpretability. , In response, previous studies have used a SHAP-based approach to identify the most important molecular descriptors that affect the PCFs of chemicals . However, it could not provide clear insights on, for example, PCF-intensive substructures and raw materials to guide the design of more sustainable molecules and processes.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Traditional research focuses on the element and energy flows in chemical enterprises or clusters, e.g., Ozalp 15 gave energy and material flow models of hydrogen production via steam reforming of methane in the United States, Chae et al 16 developed a mathematical model to synthesize the waste heat utilization network targeting a petrochemical complex, Tian et al analyzed the carbon metabolism, 17 sulfur metabolism 18 and energy metabolism 19 in a typical fine CIP, Guo et al 20 analyzed the carbon element flows in a natural gas CIP, and Ma et al 5 presented a general chlorine metabolic model at the industrial park level. Since the chemical flows involving multiple units and phases often form a complex network, programming algorithms are effective in related studies, including a genetic algorithm by Wu and Wang, 21 and improved machine learning models by data processing by Sun et al 22 Some studies focused on the network flow analysis algorithms to address the complexity of chemical industrial chains, e.g., Ren et al 23 proposed an algorithm methodology to calculate the resource productivity of crude oil from a complex network perspective.…”
Section: Introductionmentioning
confidence: 99%