2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function

Liu, Bin; Yang, Fan; Chou, Kuo‐Chen

doi:10.1016/j.omtn.2017.04.008

Cited by 231 publications

(109 citation statements)

References 139 publications

(186 reference statements)

Supporting

Mentioning

108

Contrasting

Unclassified

Order By: Relevance

“…To develop a useful sequence-based statistical predictor for a biological system as reported in a series of recent publications [74][75][76][77][78][79][80][81][82][83], the Chou's 5-step rule should be observed [84]: (1) How to construct or select a valid dataset to train and test the predictor? (2) How to formulate the biological sequence samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted?…”

Section: Methodsmentioning

confidence: 99%

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Yu¹,

Li²,

Yang³

et al. 2017

Preprint

View full text Add to dashboard Cite

The knowledge of protein function is essential for the study of biological processes, the understanding of disease mechanism and the exploration of novel therapeutic target. Apart from experimental methods, a number of in-silico approaches have been developed and extensively used for protein function prediction.Among these approaches, BLAST predicts functions based on protein sequence similarity, and machine learning predicts functional families from protein sequences irrespective of their similarity, which complements BLAST and other methods in predicting diverse classes of proteins including distantly related proteins and homologous proteins of different functions. However, their identification accuracies and the false discovery rate have not yet been assessed so far, which greatly limits the usage of these prediction algorithms. Herein, a comprehensive comparison of the performances among four popular functional prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these algorithms were systematically assessed by four metrics (sensitivity, specificity, accuracy and Matthews correlation coefficient) based on the independent test datasets generated from 93 protein families defined by UniProtKB Keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model species (homo sapiens, arabidopsis thaliana, saccharomyces cerevisiae and mycobacterium tuberculosis). As a result, the substantially higher sensitivity and stability of BLAST and SVM were observed compared with that of PNN and KNN. But the machine learning algorithms (PNN, KNN and SVM) were found capable of significantly reducing the false discovery rate (SVM < PNN ≈ KNN). In summary, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research. KEYWORDSfalse discovery rate; machine learning; protein function prediction; support vector machine; BLAST Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted:

show abstract

Section: Methodsmentioning

confidence: 99%

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Yu¹,

Li²,

Yang³

et al. 2017

Preprint

View full text Add to dashboard Cite

show abstract

“…Jun is activated through phosphorylation at Ser 63 and Ser 73 by JNK [92,93]. A high level of Jun has been observed in various types of cancer including non-small cell lung cancer, oral squamous cell carcinoma, breast cancer and colorectal cancer [94][95][96][97][98].…”

Section: Transcription Associated Genesmentioning

confidence: 99%

“…Visual comparison of the three Sv-IGFBP_N' complexes ( Figure 5a) clearly demonstrates the binding interface of the N' insulin-binding domain (supported by HADDOCK2.2 simulations: Figure S3), with all highlighted interacting residues predicted by both PDBsum and PRODIGY (shown in Figure 5b; additional residues predicted by PRODIGY presented in Table S1). Of all the predicted interacting residues presented in Figure 5b, we have highlighted those amino acids of IGFBP_N' that show conserved interaction contacts with all three ligands (*), namely: the negatively charged Asp(D) 71 and Asp(D) 94 ; supported by the polarGln(Q) 67 (where proton acceptor properties enable it to form two hydrogen bonds, stabilizing the overall negative charge); the neutrally charged Ser(S) 72 and Thr(T) 93 ; and Gly(G) 70 , Gly(G) 91, and Gly(G) 92 . In addition to these eight consistent contacts of IGFBP_N', PRODIGY predicts a further nine (Table S1).…”

Section: Complex Formationmentioning

confidence: 99%

Special Protein Molecules Computational Identification

2018

View full text Add to dashboard Cite

“…Ever since the concept of pseudo amino acid composition or Chou's PseAAC [55][56][57][58] was proposed, it has been widely used in many biomedicine and drug development areas [59,60] as well as nearly all the areas of computational proteomics(see, e.g., [39,43,45,[61][62][63][64][65][66][67][68][69][70][71][72][73] and a long list of references cited in two review papers [74,75]). Encouraged by the successes of using PseAAC to deal with protein/peptide sequences, its idea and approach have been extended to deal with DNA/RNA sequences [76][77][78][79][80][81][82] in computational genomics via PseKNC (Pseudo K-tuple Nucleotide Composition) [83,84]. Recently, a very powerful web-server called "Pse-in-One" [85] and its updated version "Pse-in-One 2.0" [86] were developed, by which users can generate any pseudo components for both protein/peptide and DNA/RNA sequences as they wish or define.…”

Section: Proteins Sample Formulationmentioning

confidence: 99%

“…incorrectly predicted to be of the i-th location. The metrics of Equation (19) have been widely used to examine the quality of predictors in genome/proteome analysis (see, e.g., [46,47,[76][77][78][79][80][107][108][109]) and computational biomedicine (see, e.g., [82,[110][111][112]). Natural Science Given in Table 3 are the corresponding results obtained by pLoc-mGpos for each of the four subcellular locations.…”

Section: Comparison With the State-of-the-art Predictormentioning

confidence: 99%

pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins

Xiao¹,

Cheng²,

Su³

et al. 2017

Self Cite

View full text Add to dashboard Cite

The basic unit in life is cell. It contains many protein molecules located at its different organelles. The growth and reproduction of a cell as well as most of its other biological functions are performed via these proteins. But proteins in different organelles or subcellular locations have different functions. Facing the avalanche of protein sequences generated in the postgenomic age, we are challenged to develop high throughput tools for identifying the subcellular localization of proteins based on their sequence information alone. Although considerable efforts have been made in this regard, the problem is far apart from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for drug targets. Using the ML-GKR (Multi-Label Gaussian Kernel Regression) method, we developed a new predictor called "pLoc-mGpos" by in-depth extracting the key information from GO (Gene Ontology) into the Chou's general PseAAC (Pseudo Amino Acid Composition) for predicting the subcellular localization of Gram-positive bacterial proteins with both single and multiple location sites. Rigorous cross-validation on a same stringent benchmark dataset indicated that the proposed pLoc-mGpos predictor is remarkably superior to "iLoc-Gpos", the state-of-the-art predictor for the same purpose. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new powerful predictor has been established at Natural Science http://www.jci-bioinfo.cn/pLoc-mGpos/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.

show abstract

2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function

Cited by 231 publications

References 139 publications

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Special Protein Molecules Computational Identification

pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins

Contact Info

Product

Resources

About