QSAR modeling of peptide biological activity by coupling support vector machine with particle swarm optimization algorithm and genetic algorithm

Zhou, Xuan; Zou, Xiaoyong; Dai, Zong; Zou, Xiaoyong

doi:10.1016/j.jmgm.2010.06.002

Cited by 34 publications

(20 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhou et al [7] seems to bias the results because the performance reported with this set is outside of the ranges shown with the bootstrapping.…”

Section: Resultsmentioning

confidence: 76%

“…Some of the methods that have been used to predict antimicrobial peptides include Partial Least Squares [2,3], Artificial Neural Networks [4], Multiple Linear Regression [5,6], and Support Vector Regression (SVR) [7][8][9], among others. Performance assessment of these methods is typically limited to few metrics obtained with fixed validation sets, measuring the distance of prediction from the real output, but providing little evidence on whether the used methods have found a real correlation.…”

Section: Introductionmentioning

confidence: 99%

“…For example, Zhou et al [7] used approximately 1500 descriptors, which were reduced to 711 after preprocessing, together with Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Support Vector Regression (SVR). However, they only used two metrics (R and RMSE) in the model assessing phase, and according to the literature, it is convenient to use additional metrics such as R 2 and R 2 pred [10].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Elements of Statistical Learning

Hastie

Tibshirani

Friedman

2009

Springer Series in Statistics

20,682

9,005

View full text Add to dashboard Cite

This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.Keywords: antimicrobial peptides; learning curves; machine learning; statistical stability; support vector regression. * Universidad Industrial de Santander (Bucaramanga-Santander, Colombia). francy.camacho1@correo.uis.edu.co. ** Universidad Industrial de Santander (Bucaramanga-Santander, Colombia). rodrigo.torres@ecopetrol.com.co. *** Universidad Industrial de Santander (Bucaramanga-Santander, Colombia). rramosp@uis.edu.co. Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides ResumenEste trabajo demuestra la importancia de obtener resultados estadísticamente estables cuando se emplean métodos de aprendizaje computacional para predecir la actividad de péptidos antimicrobianos donde, debido al costo y la complejidad de los procesos químicos, los conjuntos de datos son particularmente pequeños (menos de unos cientos de instancias). Al igual que en otros campos con problemas similares, esto produce grandes variabilidades en el rendimiento de los modelos predictivos, lo que dificulta cualquier intento por transferirlos a la práctica. Por ello, a diferencia de otros trabajos que reportan rendimientos predictivos máximos obtenidos en configuraciones experimentales muy particulares, nos enfocamos en caracterizar el comportamiento de los métodos de aprendizaje de máquina, como paso previo a obtener resultados reproducibles, estadísticamente estables y, finalmente, con una capacidad predictiva competitiva. Para este propósito se diseñó una metodología que integra el aprendizaje de cara...

show abstract

“…Zhou et al [7] seems to bias the results because the performance reported with this set is outside of the ranges shown with the bootstrapping.…”

Section: Resultsmentioning

confidence: 76%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Elements of Statistical Learning

Hastie

Tibshirani

Friedman

2009

Springer Series in Statistics

20,682

9,005

View full text Add to dashboard Cite

show abstract

“…Zhou et al [175] proposed a new method that combines particle swarm optimization algorithm (PSO) and genetic algorithm (GA) to optimize the kernel parameters of support vector machine (SVM) and determine the optimized features subset in parallel. These authors applied their method to four peptide datasets for quantitative structure-activity relationship (QSAR) research.…”

Section: Current Evolutionary Feature Selection Methods and Aplicatiomentioning

confidence: 99%

Evolutionary Computation and QSAR Research

Aguiar‐Pulido¹,

Gestal²,

Cruz-Monteagudo³

et al. 2013

CAD

View full text Add to dashboard Cite

The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.

show abstract

“…Peptides play a significant role in a vast array of biological functions. Therefore, peptides are in focus of innovative drug development efforts, due to its high activity, high selectivity and fewer side effects . QSAR including 2D‐QSAR, 3D‐QSAR and MF‐QSAR, is one of the approaches in finding the relationship between structures and the activity of peptide drugs .…”

Section: Introductionmentioning

confidence: 99%

A New Descriptor of Amino Acids‐SVGER and its Applications in Peptide QSAR

Tong

Bai

et al. 2016

Molecular Informatics

View full text Add to dashboard Cite

In the study of peptide quantitative structure activity relationship (QSAR), a new descriptor of amino acids (SVGER) was calculated. It was applied in two peptides which are angiotensin converting enzyme inhibitors and bitter tasting threshold of di-peptide. QSAR models were built by stepwise multiple regression-multiple linear regression (SMR-MLR) and stepwise multiple regression-partial least square regression (SMR-PLS). In the SMR-MLR models for angiotensin converting enzyme inhibitors, the squared cross-validation correlation coefficient (Q ) was 0.907, squared correlation coefficient between predicted and observed activities (R ) was 0.977 and external multiple correlation coefficient (Q ) was 0.867. The corresponding data for the bitter tasting threshold of di-peptide were 0.802, 0.966, 0.719. While in the SMR-PLS model, Q , R and Q were 0.804, 0.915, 0.858 for angiotensin converting enzyme inhibitors and 0.782, 0.881, 0.747 for bitter tasting threshold of di-peptide. Our results showed that descriptor SVGER can afford good account of relationships between activity and structure of peptide drugs.

show abstract

QSAR modeling of peptide biological activity by coupling support vector machine with particle swarm optimization algorithm and genetic algorithm

Cited by 34 publications

References 56 publications

The Elements of Statistical Learning

The Elements of Statistical Learning

Evolutionary Computation and QSAR Research

A New Descriptor of Amino Acids‐SVGER and its Applications in Peptide QSAR

Contact Info

Product

Resources

About