2011
DOI: 10.1007/s00726-011-0923-1
|View full text |Cite
|
Sign up to set email alerts
|

Detecting thermophilic proteins through selecting amino acid and dipeptide composition features

Abstract: Detecting thermophilic proteins is an important task for designing stable protein engineering in interested temperatures. In this work, we develop a simple but efficient method to classify thermophilic proteins from mesophilic ones using the amino acid and dipeptide compositions. Since most of the amino acid and dipeptide compositions are redundant, we propose a new forward floating selection technique to select only a useful subset of these compositions as features for support vector machine-based classificat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
27
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(29 citation statements)
references
References 33 publications
2
27
0
Order By: Relevance
“…Lin et al constructed a dataset containing 915 thermophilic proteins and 793 non-thermophilic proteins, and predicted 93.8% thermophilic proteins and 92.7% nonthermophilic proteins using SVM. The same conclusion was also reached by Nakariyakul et al (2012), who obtained 93.3% identification accuracy in the same database used by Lin. In another study, Fan et al (2016) integrated information on the amino acid composition, evolution information, and acid dissociation constant to identify thermophiles by SVM, yielding an overall accuracy of 93.53%. Modarres et al (2018) proposed a new thermophilic protein database, which contained 14 million protein sequences.…”
Section: Introductionsupporting
confidence: 77%
“…Lin et al constructed a dataset containing 915 thermophilic proteins and 793 non-thermophilic proteins, and predicted 93.8% thermophilic proteins and 92.7% nonthermophilic proteins using SVM. The same conclusion was also reached by Nakariyakul et al (2012), who obtained 93.3% identification accuracy in the same database used by Lin. In another study, Fan et al (2016) integrated information on the amino acid composition, evolution information, and acid dissociation constant to identify thermophiles by SVM, yielding an overall accuracy of 93.53%. Modarres et al (2018) proposed a new thermophilic protein database, which contained 14 million protein sequences.…”
Section: Introductionsupporting
confidence: 77%
“…As shown in Table 7 , the ranks of the top-five amino acids to be TPPs (propensity, difference) for Glu, Lys, Val, Arg and Ile are (1, 1), (2, 2), (3, 3), (4, 4) and (5, 5), respectively, while the ranks of the top-five amino acids to be non-TPPs for Gln, Thr, Ala, Asn and Phe are (20, 20), (19, 18), (18, 19), (17, 16) and (16, 13), respectively. Many previous studies indicated that Glu, Lys and Arg had higher occurrence in TPPs than MPPs 20 , 27 , 28 , 35 , 52 55 . For example, Haney et al 53 conducted a comprehensive analysis on 115 protein sequences from M. jannaschii.…”
Section: Resultsmentioning
confidence: 92%
“…Several computational efforts based on machine learning (ML) methods have been made in recent years to identify TPPs 20 , 21 , 24 33 as summarized in Table 1 . As can be seen from Table 1 , support vector machine (SVM) method is the most widely used technique for identifying TPPs 20 , 21 , 24 26 , 28 30 . For instance, Zhang and Fan 31 developed the first TPP predictor based on amino acid composition (AAC) descriptors.…”
Section: Introductionmentioning
confidence: 99%
“…They are relatively fast and unbiased in favor of a specific classifier. On the other hand, wrapper methods [10,11] use the performance of a classifier as the criterion function to assess the quality of a selected subset. The wrapper method generally achieves better classification performance than the filter method for the same number of selected genes, but it is also more time-consuming.…”
Section: Introductionmentioning
confidence: 99%