2022
DOI: 10.26434/chemrxiv-2022-dct7l
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MASSA Algorithm: automated rational sampling of training and test subsets for QSAR modelling

Abstract: The use of computer-aided drug design has become an essential part of drug development. In this context, QSAR models capable of predicting biological activities, toxicity, and pharmacokinetic properties were widely used to search bioactive molecules in chemical databases for lead compounds. The preparation of dataset used to build these models has a strong influence on the quality of the generated models, and sampling these data requires that the original dataset be divided into training (used for model traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…The lowest energy conformer for each compound was generated using an OMEGA 3.1.1.2 (OpenEye Scientific Software, USA, 2019) followed by ionization state adjustment at pH 7.4 with FixpKa (QUACPAC 2.0.1.2, OpenEye Scientific Software, USA, 2019), selecting a single favorable ionization state. Finally, the remaining compounds (n = 97) were split into training and test sets (80% and 20%, respectively) using a preliminary version of the MASSA algorithm 51 implemented on the KNIME platform. 52 The most active (lowest half-maximum inhibitory concentration; IC 50 ) compound of each study, herein referred to as 62 29 (IC 50 of 18 nM) and 106 30 (IC 50 of 21 nM), was removed from the test set.…”
Section: Data Set Selection and Preparation For Hqsarmentioning
confidence: 99%
“…The lowest energy conformer for each compound was generated using an OMEGA 3.1.1.2 (OpenEye Scientific Software, USA, 2019) followed by ionization state adjustment at pH 7.4 with FixpKa (QUACPAC 2.0.1.2, OpenEye Scientific Software, USA, 2019), selecting a single favorable ionization state. Finally, the remaining compounds (n = 97) were split into training and test sets (80% and 20%, respectively) using a preliminary version of the MASSA algorithm 51 implemented on the KNIME platform. 52 The most active (lowest half-maximum inhibitory concentration; IC 50 ) compound of each study, herein referred to as 62 29 (IC 50 of 18 nM) and 106 30 (IC 50 of 21 nM), was removed from the test set.…”
Section: Data Set Selection and Preparation For Hqsarmentioning
confidence: 99%
“…In the literature, several works can be found describing the construction of predictive models for the most diverse biological activities using different QSAR We present in this paper an open-source, easy-to-use Python tool called MASSA Algorithm ("Molecular dAta Set SAmpling Algorithm") [21] to perform the automatic sampling of datasets of molecules into training and test sets. This algorithm is based on hierarchical clustering analysis of physicochemical and structural spaces, as well as the dependent variables (biological activities).…”
Section: Introductionmentioning
confidence: 99%
“…In the literature, several works can be found describing the construction of predictive models for the most diverse biological activities using different QSAR We present in this paper an open-source, easy-to-use Python tool called MASSA Algorithm ("Molecular dAta Set SAmpling Algorithm") [28] to perform the automatic sampling of datasets of molecules into training and test sets. This algorithm is based on hierarchical clustering analysis of physicochemical and structural spaces, as well as the dependent variables (biological activities).…”
Section: Introductionmentioning
confidence: 99%