2012 International Conference on Asian Language Processing 2012
DOI: 10.1109/ialp.2012.64
|View full text |Cite
|
Sign up to set email alerts
|

Weirdness Coefficient as a Feature Selection Method for Arabic Special Domain Text Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…TC algorithms require that text features are formatted before they can be interpreted by the specified classifier, this process is also referred to as term weighting because each term is entered together with a weight value. Included papers show the most used technique is the Term Frequency-Inverse Document Frequency (TF-IDF) as in [27,32,37,40,43,45,48,51,53,55,57,58,[60][61][62]67]. It is a statistical method to indicate the significance of a word within a given corpus.…”
Section: E Feature Reresentation (Term Weighting)mentioning
confidence: 99%
See 1 more Smart Citation
“…TC algorithms require that text features are formatted before they can be interpreted by the specified classifier, this process is also referred to as term weighting because each term is entered together with a weight value. Included papers show the most used technique is the Term Frequency-Inverse Document Frequency (TF-IDF) as in [27,32,37,40,43,45,48,51,53,55,57,58,[60][61][62]67]. It is a statistical method to indicate the significance of a word within a given corpus.…”
Section: E Feature Reresentation (Term Weighting)mentioning
confidence: 99%
“…This utilization of the technique is justified assuming the authors wanted to weight terms while considering its significance across all documents rather than a single one. Although, in [58] a simpler but more limited method has also been used to conclude a Boolean value of zero or one, a term can be described to be either important or not important. Whilst in TF-IDF, for a given term, a bigger TF-IDF value indicates a more frequent word.…”
Section: E Feature Reresentation (Term Weighting)mentioning
confidence: 99%
“…Approaches such as term frequencies, term frequency-inverse document frequency (TF-IDF), weirdness coefficients, information gain and chi-squared are also used to extract quality terms and phrases. Al-Thubaity et al (2012) proposed a method to classify Arabic special texts using weirdness coefficients, and promising classification accuracy is obtained with respect to chi-squared. In this study, it was necessary to extract health-related phrases while evaluating the quality and coverage of websites.…”
Section: Quality Term Extraction From Contentmentioning
confidence: 99%
“…The results revealed better performance in classification with PCA. In [24], the researchers performed text mining on an Arabic dataset from King Abdulaziz City for Science and Technology (KACST), covering five fields with about 2,243 texts from the Islamic topics of Feqh, Tafseer, Lughah, Aqeedah, and Hadeet. The researchers employed three schemas for representation-Boolean, LTC, and TF-IDF.…”
Section: Related Workmentioning
confidence: 99%