Designing drugs when there is low data availability: one-shot learning and other approaches to face the issues of a long-term concern

Veríssimo, Gabriel Corrêa; Serafim, Mateus Sá Magalhães; Kronenberger, Thales; Ferreira, Rafaela Salgado; Honório, Káthia Maria

doi:10.1080/17460441.2022.2114451

Cited by 11 publications

(7 citation statements)

References 218 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the future, deep learning models, by learning and generalizing across feature representations, hold the promise of enhancing predictive accuracy and broadening the scope of data analysis in the study of cardiotoxicity. Further, a recurring challenge in using comprehensive -omics data is the sparsity of data, which limits prospective validation . This necessitates the development of models that can make reliable predictions even with sparse or incomplete data sets.…”

Section: Resultsmentioning

confidence: 99%

“…Further, a recurring challenge in using comprehensive -omics data is the sparsity of data, which limits prospective validation. 88 This necessitates the development of models that can make reliable predictions even with sparse or incomplete data sets. In this study, we observed that models based on computed physicochemical properties performed on par with other ensemble models.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA Drug-Induced Cardiotoxicity Rank

Seal,

Spjuth,

Hosseini-Gerami

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10−14% of postmarket withdrawals. In this study, we explored the capabilities of chemical and biological data to predict cardiotoxicity, using the recently released DICTrank data set from the United States FDA. We found that such data, including protein targets, especially those related to ion channels (e.g., hERG), physicochemical properties (e.g., electrotopological state), and peak concentration in plasma offer strong predictive ability for DICT. Compounds annotated with mechanisms of action such as cyclooxygenase inhibition could distinguish between mostconcern and no-concern DICT. Cell Painting features for ER stress discerned most-concern cardiotoxic from nontoxic compounds. Models based on physicochemical properties provided substantial predictive accuracy (AUCPR = 0.93). With the availability of omics data in the future, using biological data promises enhanced predictability and deeper mechanistic insights, paving the way for safer drug development. All models from this study are available at https://broad.io/DICTrank_Predictor.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA Drug-Induced Cardiotoxicity Rank

Seal,

Spjuth,

Hosseini-Gerami

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…In summary, those observations could be very useful in the design of novel cytotoxic compounds with potential anticancer application. Since all compounds share structural similarity and the biological data were obtained without interlaboratory interference, despite the low number of samples the obtained information is highly valuable for drug design campaigns [18] …”

Section: Resultsmentioning

confidence: 99%

“…Since all compounds share structural similarity and the biological data were obtained without interlaboratory interference, despite the low number of samples the obtained information is highly valuable for drug design campaigns. [18]…”

Section: Computational Studiesmentioning

confidence: 99%

Synthesis and Cytotoxic Studies of Pyrrole and Pyrrolidine Derivatives in Human Tumor Cell Lines

Lino,

Freitas,

Villarreal

et al. 2024

ChemistrySelect

View full text Add to dashboard Cite

Heterocyclic compounds such as pyrrole and pyrrolidine derivatives have a broad spectrum of biological activity, being widely used as pharmacophore to design novel bioactive compounds. In this work, sixteen pyrrole and pyrrolidine derivatives were synthesized and evaluated for their growth inhibitory activity on two human cancer cells lines: breast cancer MDA‐MB‐231 and chronic myelogenous leukemia (K562). Eight compounds showed activity against at least one tumor cell line (IC50=49.15–195.9 μM). The compounds were tested in non‐cancerous human lung fibroblasts WI‐26VA4 to evaluate their selectivity index. In addition, Hierarchical Cluster Analysis (HCA) studies were carried out in attempt to establish a structure‐activity relationship.

show abstract

“…Examples include "Small Data Set QSAR Modeling," which finds predictive models by using exhaustive cross-validation across different https://doi.org/10.26434/chemrxiv-2022-dct7l-v3 ORCID: https://orcid.org/0000-0001-9675-5907 Content not peer-reviewed by ChemRxiv. License: CC BY-NC-ND 4.0 sampling replicates, oversampling strategies, and the use of other machine learning methods based on transfer and few-shot learning [59][60][61].…”

Section: Validation Of Qsar Modelsmentioning

confidence: 99%

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

Veríssimo,

Pantaleão,

Fernandes

et al. 2023

J Comput Aided Mol Des

Self Cite

View full text Add to dashboard Cite

QSAR models capable of predicting biological, toxicity, and pharmacokinetic properties were widely used to search lead bioactive molecules in chemical databases. The dataset's preparation to build these models has a strong influence on the quality of the generated models, and sampling requires that the original dataset be divided into training (for model training) and test (for statistical evaluation) sets. This sampling can https://doi.org/10.26434/chemrxiv-2022-dct7l-v3 ORCID: https://orcid.org/0000-0001-9675-5907 Content not peer-reviewed by ChemRxiv. License: CC BY-NC-ND 4.0 be done randomly or rationally, but the rational division is superior. In this paper, we present MASSA, a Python tool that can be used to automatically sample datasets by exploring the biological, physicochemical, and structural spaces of molecules using PCA, HCA, and K-modes. The proposed algorithm is very useful when the variables used for QSAR are not available or to construct multiple QSAR models with the same training and test sets, producing models with lower variability and better values for validation metrics. These results were obtained even when the descriptors used in the QSAR/QSPR were different from those used in the separation of training and test sets, indicating that this tool can be used to build models for more than one QSAR/QSPR technique. Finally, this tool also generates useful graphical representations that can provide insights into the data.

show abstract

Designing drugs when there is low data availability: one-shot learning and other approaches to face the issues of a long-term concern

Cited by 11 publications

References 218 publications

Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA Drug-Induced Cardiotoxicity Rank

Insights into Drug Cardiotoxicity from Biological and Chemical Data: The First Public Classifiers for FDA Drug-Induced Cardiotoxicity Rank

Synthesis and Cytotoxic Studies of Pyrrole and Pyrrolidine Derivatives in Human Tumor Cell Lines

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

Contact Info

Product

Resources

About