Proceedings of the 22nd ACM International Conference on Information &Amp; Knowledge Management 2013
DOI: 10.1145/2505515.2505695
|View full text |Cite
|
Sign up to set email alerts
|

Building a large-scale corpus for evaluating event detection on twitter

Abstract: Despite the popularity of Twitter for research, there are very few publicly available corpora, and those which are available are either too small or unsuitable for tasks such as event detection. This is partially due to a number of issues associated with the creation of Twitter corpora, including restrictions on the distribution of the tweets and the difficultly of creating relevance judgements at such a large scale. The difficulty of creating relevance judgements for the task of event detection is further ham… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
118
0
3

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 171 publications
(128 citation statements)
references
References 15 publications
1
118
0
3
Order By: Relevance
“…McMinn et al [55] propose a methodology for creating a corpus to evaluate event detection methods. They used two existing state-of-the art event detection approaches [28,54] together with Wikipedia to create a set of candidate events together with a list of associated tweets.…”
Section: Available Corpora For Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…McMinn et al [55] propose a methodology for creating a corpus to evaluate event detection methods. They used two existing state-of-the art event detection approaches [28,54] together with Wikipedia to create a set of candidate events together with a list of associated tweets.…”
Section: Available Corpora For Evaluationmentioning
confidence: 99%
“…For example, the organizers of the 2014 SNOW challenge [12] could only crawl 1 106 712 of the original 3 630 816 tweets of the above-mentioned 2012 US Presidential Election data set [37]. In order to assess how useable these collections of tweet identifiers are, we attempted to download the corpus of McMinn et al [55]. The standard restriction of crawling tweets with the Twitter API 4 is set to 180 queries per 15 minute window.…”
Section: Available Corpora For Evaluationmentioning
confidence: 99%
“…On the other hand, Aggarwal, et al [3] states that a news event is "something that happens at a specific time and place, but it is also an object of interest to the news media". Similarly, McMinn et al [9] define an event as something significant happening in a specific time and place beside it lead to discussions by the news media. This event might be a political event, natural disaster, terror attack or a protest, etc.…”
Section: Event Definitionmentioning
confidence: 99%
“…Thus, the methods and techniques used for these kinds of events should be evaluated in terms of how fast they can be identified rather than just evaluating based on precision and recall measurements [1]. Unfortunately, there are very few ED evaluation datasets [9]. The TDT5 dataset has been utilized by many studies [43], to evaluate precision.…”
Section: Evaluation Challengesmentioning
confidence: 99%
“…We test our approach on two gold standard corpora: the First Story Detection (FSD) corpus (Petrović et al, 2012) and the EVENT2012 corpus (McMinn et al, 2013).…”
Section: Datasetmentioning
confidence: 99%