2014
DOI: 10.1155/2014/173869
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Abstract: Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(14 citation statements)
references
References 38 publications
0
14
0
Order By: Relevance
“…Feature selection in the context of protein classification is conducted on the k-mers obtained from a protein sequence instead of the original one [see Leslie et al, 2004;Iqbal et al]. It can be observed that the RKHS H produced by the string kernel is finite dimensional and hence satisfies the regularity conditions on the RKHS trivially, and hence, the coordinates of the transformed space (the k-mers) can be used directly for feature selection.…”
Section: Case Study 3: Protein Classification With Mismatch String Kementioning
confidence: 99%
“…Feature selection in the context of protein classification is conducted on the k-mers obtained from a protein sequence instead of the original one [see Leslie et al, 2004;Iqbal et al]. It can be observed that the RKHS H produced by the string kernel is finite dimensional and hence satisfies the regularity conditions on the RKHS trivially, and hence, the coordinates of the transformed space (the k-mers) can be used directly for feature selection.…”
Section: Case Study 3: Protein Classification With Mismatch String Kementioning
confidence: 99%
“…The n-grams are, in general, contiguous specific amino acid subsequences of length n. This concept is commonly used in text and natural language processing, but it has also been used in the context of protein analysis [15], [16], [17], [18], even by direct transposition of text classification methods for the classification of GPCRs [19].…”
Section: B the N-gram Gpcr Sequence Transformationmentioning
confidence: 99%
“…Muhammad Javed Iqbal et al [16] proposed a feature subset selection technique whereby the statistical significance of each feature of a superfamily from all other superfamilies is measured. This technique was applied on a protein sequence represented by a vector of 8420 features.…”
Section: Introductionmentioning
confidence: 99%