Secondary structure predictions are increasingly becoming the workhorse for several methods aiming at predicting protein structure and function. Here we use ensembles of bidirectional recurrent neural network architectures, PSI-BLAST-derived profiles, and a large nonredundant training set to derive two new predictors: (a) the second version of the SSpro program for secondary structure classification into three categories and (b) the first version of the SSpro8 program for secondary structure classification into the eight classes produced by the DSSP program. We describe the results of three different test sets on which SSpro achieved a sustained performance of about 78% correct prediction. We report confusion matrices, compare PSI-BLAST to BLAST-derived profiles, and assess the corresponding performance improvements. SSpro and SSpro8 are implemented as web servers, available together with other structural feature predictors at: http://promoter.ics.uci.edu/BRNN-PRED/.
Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting molecular properties. However molecules are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks associated with all possible vertex-centered acyclic orientations of the molecular graph. One advantage of this approach is that it relies only minimally on the identification of suitable molecular descriptors, since suitable representations are learnt automatically from the data. Several variants of this approach are applied to the problem of predicting aqueous solubility and tested on four benchmark datasets. Experimental results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A web-based predictor AquaSol is available online through the ChemDB portal (cdb.ics.uci.edu) together with additional material.
The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides.We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4–20 amino acids) and one focused on long peptides ( amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure.We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.