2023
DOI: 10.1021/acs.jcim.3c00546
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning

Abstract: Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph atte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 104 publications
0
4
0
Order By: Relevance
“…Others , have suggested the use of advanced aggregation techniques, such as attention mechanisms, to mix feature channels between solvent and solute, but our initial experiments suggested concatenation to be more robust. Vermeire and Green found that using additional features such as polar surface as computed by RDKit can improve performance, and several other studies have also explored augmenting D-MPNNs with a variety of heuristic, quantum mechanical, and chemical descriptors. However, the outcome of using descriptors on model performance is not very consistent, and given the large number of solute conformers in this work, we do not use additional features in this work. We do note that the aforementioned studies are for 2D D-MPNNs and that exploring how additional descriptors impact the model architecture adopted here could be a topic of future work.…”
Section: Methodsmentioning
confidence: 99%
“…Others , have suggested the use of advanced aggregation techniques, such as attention mechanisms, to mix feature channels between solvent and solute, but our initial experiments suggested concatenation to be more robust. Vermeire and Green found that using additional features such as polar surface as computed by RDKit can improve performance, and several other studies have also explored augmenting D-MPNNs with a variety of heuristic, quantum mechanical, and chemical descriptors. However, the outcome of using descriptors on model performance is not very consistent, and given the large number of solute conformers in this work, we do not use additional features in this work. We do note that the aforementioned studies are for 2D D-MPNNs and that exploring how additional descriptors impact the model architecture adopted here could be a topic of future work.…”
Section: Methodsmentioning
confidence: 99%
“…With a set of four equations (eq 12 -15) that are linear with respect to 𝑎, 𝑏, 𝑐, and 𝑑, we can solve for those four parameters based on ΔΔ𝐺 ‡ solv (298 K), ΔΔ𝐻 ‡ solv (298 K), and the saturation densities (𝜌 l (𝑇 )) and critical properties (𝑇 c , 𝜌 c ) of a solvent. The densities and properties of many common solvents can be obtained using open-source models, such as fluid thermodynamics packages CoolProp [33] and Clapeyron.jl [34], and a machine learning model developed by Biswas et al [35] Since the solvent's properties are easily obtainable from the existing models, we can treat them as known values. Thus, only ΔΔ𝐺 ‡ solv (298 K) and ΔΔ𝐻 ‡ solv (298 K) are needed to solve for the four empirical parameters and hence obtain ΔΔ𝐺 ‡ solv at any temperature.…”
Section: −𝐸 𝑎 𝑅𝑇mentioning
confidence: 99%
“…If we are interested in assessing model performance on new molecules, we can train a model with many reaction templates but use substructure splitting to create training, validation, and testing sets. Bemis-Murcko scaffolds [70] are commonly used to partition the data for this purpose, though clustering based on other input features or chemical similarity to measure extrapolation has also been explored [23,[71][72][73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88] as has quantifying domains of model applicability [89][90][91][92][93]. Scaffold splitting is not perfect, but by ensuring that molecules in the testing set are structurally different than those in the training set, it offers a much better assessment of generalizability than splitting randomly [17,24,67,[94][95][96][97][98][99][100][101][102][103][104][105][106][107][108][109]…”
Section: Interpolation Vs Extrapolationmentioning
confidence: 99%