Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012
DOI: 10.1145/2348283.2348412
|View full text |Cite
|
Sign up to set email alerts
|

Improving tweet stream classification by detecting changes in word probability

Abstract: We propose a classification model of tweet streams in Twitter, which are representative of document streams whose statistical properties will change over time. Our model solves several problems that hinder the classification of tweets; in particular, the problem that the probabilities of word occurrence change at different rates for different words. Our model switches between two probability estimates based on full and recent data for each word when detecting changes in word probability. This switching enables… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(22 citation statements)
references
References 34 publications
0
22
0
Order By: Relevance
“…Besides traditional text classification techniques, some recent works have focused on short text classification [24], [25]. [25] has extracted eight features for 5-class classification (i.e., news, events, opinions, deals, and private messages).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides traditional text classification techniques, some recent works have focused on short text classification [24], [25]. [25] has extracted eight features for 5-class classification (i.e., news, events, opinions, deals, and private messages).…”
Section: Methodsmentioning
confidence: 99%
“…Limited by its small parameter space, this method cannot handle classification problem with millions of hashtags as class labels. [24] has further considered changes in word probability to classify Tweet stream, which is only based on texts. It does not consider Twitter-specific features nor user's preferences.…”
Section: Methodsmentioning
confidence: 99%
“…In the same year Alvanaki et al [1] proposed a system "enBlogue", which analyzes statistics about tags and tag pairs for identifying unusual shifts in correlations. Further recent work proposed by Nishida et al [15] shows a classification model of tweet streams for identifying changes in statistical properties on word basis, which is used for topic classification. Also in the same year Zimmermann et al [23] propose a text stream clustering method that detects, tracks and updates large and small bursts of news in a two-level topic hierarchy.…”
Section: Related Workmentioning
confidence: 99%
“…For example, opinions are not the focus of our work. Nishida et al [2012] presented a wide range of tweet classification frameworks using a temporally aware Naïve Bayes classifier. Their experiments were conducted on a data set in which classes were defined based on their hashtags.…”
Section: Tweet Classificationmentioning
confidence: 99%