2020
DOI: 10.1021/acs.jcim.0c00908
|View full text |Cite
|
Sign up to set email alerts
|

Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction

Abstract: During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure−activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 41 publications
0
9
0
Order By: Relevance
“…These ensembles were constructed using two well-known feature selection methods: fast clustering-based feature selection (FAST) and fast correlation-based filter (FCBF). 81 They tested the classification performance of two ensemble methods and three ML algorithms (DT, SVM, and RF) using G-mean and MCC as evaluation metrics. These metrics take into account the uneven distribution of class samples.…”
Section: Various Toxicities Predictionsmentioning
confidence: 99%
“…These ensembles were constructed using two well-known feature selection methods: fast clustering-based feature selection (FAST) and fast correlation-based filter (FCBF). 81 They tested the classification performance of two ensemble methods and three ML algorithms (DT, SVM, and RF) using G-mean and MCC as evaluation metrics. These metrics take into account the uneven distribution of class samples.…”
Section: Various Toxicities Predictionsmentioning
confidence: 99%
“…The structure-based molecular design mainly includes a receptor-based method through a three-dimensional (3D) chemical structure to obtain ligand interaction [1,35,36]. However, traditional QSAR models may frequently miss suitable candidate molecules, because of the poor predictive accuracy and versatility caused by poor feature selection that requires skill and knowledge and conformational limitations for coincidence effect [1,[37][38][39]. Therefore, a QSAR system with high-throughput and performance is desired because of the development of novel medicines, chemicals, and nanomaterials on human health.…”
Section: Introductionmentioning
confidence: 99%
“…Ensembles of feature selectors focused on overcoming class imbalance problems have also been proposed. 31 …”
Section: Introductionmentioning
confidence: 99%
“…Ensembles of feature selectors focused on overcoming class imbalance problems have also been proposed. 31 In the construction of feature selection ensembles, the combination of the results of the different base selectors is crucial. 32 The set of methods for combining feature subset selectors is usually limited to take into account the result of applying each feature selector by storing in a vector the number of times that each feature was selected; this vector is used to obtain the final selection.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation