Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004.
DOI: 10.1109/ictta.2004.1307829
|View full text |Cite
|
Sign up to set email alerts
|

Textmining, feature selection and datarnining for proteins classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 8 publications
0
6
0
Order By: Relevance
“…In the second stage, we will determine the best size of k-grams by applying the off-line extraction algorithm [14] In the TABLE II, it can be seen that number of k-grams becomes too high from k=4, which degrades the performance of classifiers and increases the storage space and computing time. We then conclude that the compromise: k= 3grams / SVM is more appropriate in the evaluation of biological sequences which we will use in the next experiment to compare filter correlation based filter method and our contribution FCDBF.…”
Section: A Compromise: Classifier/ K-gramsmentioning
confidence: 98%
“…In the second stage, we will determine the best size of k-grams by applying the off-line extraction algorithm [14] In the TABLE II, it can be seen that number of k-grams becomes too high from k=4, which degrades the performance of classifiers and increases the storage space and computing time. We then conclude that the compromise: k= 3grams / SVM is more appropriate in the evaluation of biological sequences which we will use in the next experiment to compare filter correlation based filter method and our contribution FCDBF.…”
Section: A Compromise: Classifier/ K-gramsmentioning
confidence: 98%
“…In the current study, we use transformations that have their foundations in the field of symbolic language analysis instead. They treat protein sequences as text from a 20 amino acid alphabet [30,31]. Here, short sequence fragments known as n-grams are understood as "words".…”
Section: Alignment Free Sequence Transformationsmentioning
confidence: 99%
“…Previous works showed that the choice of n = 3 (3-grams) and boolean descriptors give a good compromise to produce accurate classifier [1]. We obtain a Boolean attribute -value dataset with several thousands of descriptors ( Figure 3).…”
Section: The Protein Classification Problemmentioning
confidence: 99%
“…Their number is very high, which induces drawbacks: the computing time is very high and the quality of the learning classifier is often poor because we have a sparse dataset, and it is difficult to estimate in a reliable way the probability distribution ("The curse of Dimensionality Problem"). In a protein discrimination process from their primary structures [1], the native description of a protein is a succession of characters representing amino acids. It is not possible to run directly a learning algorithm.…”
Section: Introductionmentioning
confidence: 99%