ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction

Wu, Jialu; Wang, Junmei; Wu, Zhenxing; Zhang, Shengyu; Deng, Yafeng; Kang, Yu; Cao, Dong‐Sheng; Hsieh, Chang‐Yu; Hou, Tingjun

doi:10.1021/acs.jcim.2c01290

Cited by 8 publications

(13 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, sometimes classical ML techniques as simple as decision trees or random forest can be quite effective at estimating kinetic parameters [28] or ranking reaction conformations [29], and we do not doubt that there are some molecular problems and/or datasets where KRR or other architectures may have some advantages over GNN-based approaches. Our argument instead is that comparing model performance to relevant baselines-and to experimental data when available [30][31][32][33][34][35][36][37][38][39][40][41][42]-should never be 'beyond the scope' of work for a model-developer. Rather, it is an essential first step at convincing readers that the model is useful.…”

Section: Providing Context For Model Performancementioning

confidence: 99%

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

Spiekermann,

Stuyver,

Pattanaik

et al. 2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

In a recent article in this journal, van Gerwen et al (2022 Mach. Learn.: Sci. Technol. 3 045005) presented a kernel ridge regression model to predict reaction barrier heights. Here, we comment on the utility of that model and present references and results that contradict several statements made in that article. Our primary interest is to offer a broader perspective by presenting three aspects that are essential for researchers to consider when creating models for chemical kinetics: (1) are the model’s prediction targets and associated errors sufficient for practical applications? (2) Does the model prioritize user-friendly inputs so it is practical for others to integrate into prediction workflows? (3) Does the analysis report performance on both interpolative and more challenging extrapolative data splits so users have a realistic idea of the likely errors in the model’s predictions?

show abstract

Section: Providing Context For Model Performancementioning

confidence: 99%

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

Spiekermann,

Stuyver,

Pattanaik

et al. 2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…Most tools that extract information innate to protein sequences either utilize the structural hierarchy framework or work without a hierarchical framework. Examples include predicting disease-associated mutations [6][7][8][9][10][11][12], predicting solubility [13][14][15], detecting aggregating regions [16][17][18][19][20], predicting intrinsic disorder [18,[21][22][23][24][25], designing protein sequences with specific features [26][27][28], and comparing sequences [29,30]. Some of these tools use a fixed-width moving window to define local sequence context, which artificially places the considered residue at the center of its "local sequence" and ignores any natural boundaries present within proteins.…”

Section: Introductionmentioning

confidence: 99%

The blobulator: a webtool for identification and visual exploration of hydrophobic modularity in protein sequences

Pitman,

Santiago-McRae,

Lohia

et al. 2024

Preprint

View full text Add to dashboard Cite

Clusters of hydrophobic residues are known to promote structured protein stability and drive protein aggregation. Recent work has shown that identifying contiguous hydrophobic residue clusters (termed 'blobs') has proven useful in both intrinsically disordered protein (IDP) simulation and human genome studies. However, a graphical interface was unavailable. Here, we present the blobulator: an interactive and intuitive web interface to detect intrinsic modularity in any protein sequence based on hydrophobicity. We demonstrate three use cases of the blobulator and show how identifying blobs with biologically relevant parameters provides useful information about a globular protein, two orthologous membrane proteins, and an IDP. Other potential applications are discussed, including: predicting protein segments with critical roles in tertiary interactions, providing a definition of local order and disorder with clear edges, and aiding in predicting protein features from sequence. The blobulator GUI can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.

show abstract

“…This entails identifying promising compounds from a large pool of molecules and receiving ADMET feedback before actual synthesis. To address these issues, the development of quantitative structure–property relationship (QSPR) models, using computer technology to predict ADMET properties, has emerged as a cost-effective and efficient alternative. − …”

Section: Introductionmentioning

confidence: 99%

Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge

Duan,

Yang,

Zeng

et al. 2024

J. Med. Chem.

Self Cite

View full text Add to dashboard Cite

Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.

show abstract

ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction

Cited by 8 publications

References 59 publications

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

Comment on ‘Physics-based representations for machine learning properties of chemical reactions’

The blobulator: a webtool for identification and visual exploration of hydrophobic modularity in protein sequences

Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge

Contact Info

Product

Resources

About