Research and Development in Intelligent Systems XXIX 2012
DOI: 10.1007/978-1-4471-4739-8_16
|View full text |Cite
|
Sign up to set email alerts
|

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Abstract: Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 9 publications
0
14
0
Order By: Relevance
“…Character n-grams have been used for email spam detection [11] and sentiment classification [1] with higher effectiveness than using word n-grams. They are also considered the state-of-the-art for authorship attribution [19].…”
Section: Related Workmentioning
confidence: 99%
“…Character n-grams have been used for email spam detection [11] and sentiment classification [1] with higher effectiveness than using word n-grams. They are also considered the state-of-the-art for authorship attribution [19].…”
Section: Related Workmentioning
confidence: 99%
“…Big social data offers the potential for new insights into human behaviour and development of robust models capable of describing individuals and societies [6]. Social media has been used in varying computer system approaches; in the past this has mainly been the textual information contained in blogs, status posts and photo comments [2,3], but there is also a wealth of information in the other ways of interacting with online artefacts. Research in image or video analysis includes promising studies on YouTube videos for classification of specific behaviours and indicators of personality traits [1].…”
mentioning
confidence: 99%
“…Analyzing media with a very informal language benefits from involving novel features, such as emoticons (Pak & Paroubek, 2010;Montejo-Ráez, Martínez-Cámara, Martín-Valdivia, & Ureña López, 2012), character n-grams (Blamey, Crick, & Oatley, 2012), POS and POS ratio (Ahkter & Soria, 2010;Kouloumpis et al, 2011), or word shape (Go et al, 2009;Agarwal, Xie, Vovsha, Rambow, & Passonneau, 2011).…”
Section: Supervised Machine Learning For Sentiment Analysismentioning
confidence: 99%
“…Similarly to the word n-gram features, we added character n-gram features, as proposed by, e.g., (Blamey et al, 2012). We set the minimum occurrence of a particular character n-gram to five, in order to prune the feature space.…”
Section: Character N-gram Featuresmentioning
confidence: 99%