Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents 2010
DOI: 10.1145/1871985.1872001
|View full text |Cite
|
Sign up to set email alerts
|

On the difficulty of clustering company tweets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…They tested their application with the WePS-3 Online Reputation Management Corpus and obtained 0.69 accuracy. In [13], company tweets in WePS-3 are clustered as true or false according to the term expansion methodology. In order to solve company name ambiguity, they employed a clustering technique different from our classification task.…”
Section: Company Disambiguationmentioning
confidence: 99%
“…They tested their application with the WePS-3 Online Reputation Management Corpus and obtained 0.69 accuracy. In [13], company tweets in WePS-3 are clustered as true or false according to the term expansion methodology. In order to solve company name ambiguity, they employed a clustering technique different from our classification task.…”
Section: Company Disambiguationmentioning
confidence: 99%
“…Having a pre-processed subjective text with class labels, sentiment classification can be conducted at the document [13], sentence [14] or phrase levels [15] (where a phrase is part of a sentence) which we refer to as the granularity of the classification. Finally, knowing the source and the target of a sentiment is considered as one of the challenges of sentiment analysis that was addressed by number of researchers [16].…”
Section: Sentiment Analysismentioning
confidence: 99%
“…Another study exploredhow Twitter can be used to construct a news processing system, from tweets by automatically grouping news tweets into clusters, such that each cluster consists of tweets relating to a particular topic (Sankaranarayanan et al, 2009). Perez-Tellez et al (2010), presented and compared a number of different methods based on clustering to determine whether a certain tweet refers to a specific company or not.Application ofk-means clustering technique for masses consisting of a huge number of documents came up with the conclusion that when the documents' content is very short (as in the case of tweets), it is more appropriate to cluster the words instead of the documents. Therefore, a method that clusters the words using the word co-occurrence as a similarity measure was proposed by Khot (2010).…”
Section: Related Workmentioning
confidence: 99%
“…Karandikar (2010) stated "Such a short piece of text provides very few contextual clues for applying machine learning techniques". This type of data results in weak performance of most clustering methods due to the informal writing style of tweets that can be full of jargons, misspellings, colloquial and out of vocabulary words with poor grammatical structure (Perez-Tellez et al, 2010).…”
Section: The Problemmentioning
confidence: 99%