2013
DOI: 10.1371/journal.pone.0075542
|View full text |Cite
|
Sign up to set email alerts
|

Efficient and Interpretable Prediction of Protein Functional Classes by Correspondence Analysis and Compact Set Relations

Abstract: Predicting protein functional classes such as localization sites and modifications plays a crucial role in function annotation. Given a tremendous amount of sequence data yielded from high-throughput sequencing experiments, the need of efficient and interpretable prediction strategies has been rapidly amplified. Our previous approach for subcellular localization prediction, PSLDoc, archives high overall accuracy for Gram-negative bacteria. However, PSLDoc is computational intensive due to incorporation of homo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…1 b) datasets. TFPSSM extracts homology information from BLAST search, which has been shown to be efficient against a non-redundant database without losing prediction performance [ 11 ]. This has also been confirmed by our experiment because the dashed (original) and solid (non-redundant) lines of the same color are almost identical (compatible Fmax), which shows that the TFPSSM 1NN algorithm effectively picks out neighbors by keeping one representative of redundant sequences (non-redundant dataset).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…1 b) datasets. TFPSSM extracts homology information from BLAST search, which has been shown to be efficient against a non-redundant database without losing prediction performance [ 11 ]. This has also been confirmed by our experiment because the dashed (original) and solid (non-redundant) lines of the same color are almost identical (compatible Fmax), which shows that the TFPSSM 1NN algorithm effectively picks out neighbors by keeping one representative of redundant sequences (non-redundant dataset).…”
Section: Resultsmentioning
confidence: 99%
“…That is, the frequency of the gapped di-peptides is calculated based on the PSSM. The fast (insensitive) PSI-BLAST parameter setting is used to reduce running times (−matrix BLOSUM80 –evalue 1e-5 –gapopen 9 –gapextend 2 –threshold 999 –seq yes –soft_masking true –numter_iteration 2) [ 11 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Previous works applied CA to analyze how peptides and protein structures impact their functionalities: in a study, CA was applied to analyze how peptide compositions of cheese are involved in their stretchability (Lacou et al, 2015). Another study initially based on CA focused on the clustering ways and levels of proteins according to their sequences (Chang et al, 2013). In the presented approach, clustering of peptides was carried out by initially structuring data by CA before HCA application.…”
Section: Discussionmentioning
confidence: 99%