Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction

Antelo-Collado, Aurelio; Carrasco-Velar, Ramón; García-Pedrajas, Nicolás; García, Gonzalo Cerruela

doi:10.1021/acs.jcim.0c00908

Cited by 14 publications

(9 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These ensembles were constructed using two well-known feature selection methods: fast clustering-based feature selection (FAST) and fast correlation-based filter (FCBF). 81 They tested the classification performance of two ensemble methods and three ML algorithms (DT, SVM, and RF) using G-mean and MCC as evaluation metrics. These metrics take into account the uneven distribution of class samples.…”

Section: Various Toxicities Predictionsmentioning

confidence: 99%

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

Cavasotto

Scardino

2022

ACS Omega

View full text Add to dashboard Cite

Machine learning (ML) models to predict the toxicity of small molecules have garnered great attention and have become widely used in recent years. Computational toxicity prediction is particularly advantageous in the early stages of drug discovery in order to filter out molecules with high probability of failing in clinical trials. This has been helped by the increase in the number of large toxicology databases available. However, being an area of recent application, a greater understanding of the scope and applicability of ML methods is still necessary. There are various kinds of toxic end points that have been predicted in silico. Acute oral toxicity, hepatotoxicity, cardiotoxicity, mutagenicity, and the 12 Tox21 data end points are among the most commonly investigated. Machine learning methods exhibit different performances on different data sets due to dissimilar complexity, class distributions, or chemical space covered, which makes it hard to compare the performance of algorithms over different toxic end points. The general pipeline to predict toxicity using ML has already been analyzed in various reviews. In this contribution, we focus on the recent progress in the area and the outstanding challenges, making a detailed description of the state-of-the-art models implemented for each toxic end point. The type of molecular representation, the algorithm, and the evaluation metric used in each research work are explained and analyzed. A detailed description of end points that are usually predicted, their clinical relevance, the available databases, and the challenges they bring to the field are also highlighted.

show abstract

Section: Various Toxicities Predictionsmentioning

confidence: 99%

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

Cavasotto

Scardino

2022

ACS Omega

View full text Add to dashboard Cite

show abstract

“…The structure-based molecular design mainly includes a receptor-based method through a three-dimensional (3D) chemical structure to obtain ligand interaction [1,35,36]. However, traditional QSAR models may frequently miss suitable candidate molecules, because of the poor predictive accuracy and versatility caused by poor feature selection that requires skill and knowledge and conformational limitations for coincidence effect [1,[37][38][39]. Therefore, a QSAR system with high-throughput and performance is desired because of the development of novel medicines, chemicals, and nanomaterials on human health.…”

Section: Introductionmentioning

confidence: 99%

A Deep Learning-Based Quantitative Structure–Activity Relationship System Construct Prediction Model of Agonist and Antagonist with High Performance

Matsuzaka

Uesawa

2022

IJMS

View full text Add to dashboard Cite

Molecular design and evaluation for drug development and chemical safety assessment have been advanced by quantitative structure–activity relationship (QSAR) using artificial intelligence techniques, such as deep learning (DL). Previously, we have reported the high performance of prediction models molecular initiation events (MIEs) on the adverse toxicological outcome using a DL-based QSAR method, called DeepSnap-DL. This method can extract feature values from images generated on a three-dimensional (3D)-chemical structure as a novel QSAR analytical system. However, there is room for improvement of this system’s time-consumption. Therefore, in this study, we constructed an improved DeepSnap-DL system by combining the processes of generating an image from a 3D-chemical structure, DL using the image as input data, and statistical calculation of prediction-performance. Consequently, we obtained that the three prediction models of agonists or antagonists of MIEs achieved high prediction-performance by optimizing the parameters of DeepSnap, such as the angle used in the depiction of the image of a 3D-chemical structure, data-split, and hyperparameters in DL. The improved DeepSnap-DL system will be a powerful tool for computer-aided molecular design as a novel QSAR system.

show abstract

“…Ensembles of feature selectors focused on overcoming class imbalance problems have also been proposed. 31 …”

Section: Introductionmentioning

confidence: 99%

“…Ensembles of feature selectors focused on overcoming class imbalance problems have also been proposed. 31 In the construction of feature selection ensembles, the combination of the results of the different base selectors is crucial. 32 The set of methods for combining feature subset selectors is usually limited to take into account the result of applying each feature selector by storing in a vector the number of times that each feature was selected; this vector is used to obtain the final selection.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Graph-Based Feature Selection Approach for Molecular Activity Prediction

García

Cuevas-Muñoz

García-Pedrajas

2022

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

In the construction of QSAR models for the prediction of molecular activity, feature selection is a common task aimed at improving the results and understanding of the problem. The selection of features allows elimination of irrelevant and redundant features, reduces the effect of dimensionality problems, and improves the generalization and interpretability of the models. In many feature selection applications, such as those based on ensembles of feature selectors, it is necessary to combine different selection processes. In this work, we evaluate the application of a new feature selection approach to the prediction of molecular activity, based on the construction of an undirected graph to combine base feature selectors. The experimental results demonstrate the efficiency of the graph-based method in terms of the classification performance, reduction, and redundancy compared to the standard voting method. The graph-based method can be extended to different feature selection algorithms and applied to other cheminformatics problems.

show abstract

Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction

Cited by 14 publications

References 41 publications

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

A Deep Learning-Based Quantitative Structure–Activity Relationship System Construct Prediction Model of Agonist and Antagonist with High Performance

Graph-Based Feature Selection Approach for Molecular Activity Prediction

Contact Info

Product

Resources

About