Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism 2017
DOI: 10.18653/v1/w17-4211
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Event Clustering and Aggregation from Newswire and Web Articles

Abstract: In this paper, we present an unsupervised pipeline approach for clustering news articles based on identified event instances in their content. We leverage press agency newswire and monolingual word alignment techniques to build meaningful and linguistically varied clusters of articles from the Web in the perspective of a broader event type detection task. We validate our approach on a manually annotated corpus of Web articles.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…A great portion of works focuses on detecting keywords to cluster sentences/articles expressing the same event or discussing a target topic in a text stream (i.e., topic tracking) [117]- [120], [127]- [132]. Documents are typically first converted into vectorial representations (e.g., BOW [131], TF-IDF [127], [130], lexical chains [128], multinomial distributions via Bayesian approaches [117]- [119], [129] or GANs [120]), sometimes augmented with named entities, time, and location [117], [119], [120], [129], [131], [132]. Each resulting cluster (e.g., k-means [132], agglomerative hierarchical clustering [127], sequence-based and iterative TF-IDF clustering [130], Markov clustering [131]) brings together documents describing the same high-level event (termed as meta-event), whose simplified definition depends on the previously modeled and embedded information (i.e, semantically similar words, temporal proximity, etc.…”
Section: ) Unsupervised Learningmentioning
confidence: 99%
“…A great portion of works focuses on detecting keywords to cluster sentences/articles expressing the same event or discussing a target topic in a text stream (i.e., topic tracking) [117]- [120], [127]- [132]. Documents are typically first converted into vectorial representations (e.g., BOW [131], TF-IDF [127], [130], lexical chains [128], multinomial distributions via Bayesian approaches [117]- [119], [129] or GANs [120]), sometimes augmented with named entities, time, and location [117], [119], [120], [129], [131], [132]. Each resulting cluster (e.g., k-means [132], agglomerative hierarchical clustering [127], sequence-based and iterative TF-IDF clustering [130], Markov clustering [131]) brings together documents describing the same high-level event (termed as meta-event), whose simplified definition depends on the previously modeled and embedded information (i.e, semantically similar words, temporal proximity, etc.…”
Section: ) Unsupervised Learningmentioning
confidence: 99%
“…It is based on simulating random walks along nodes in a graph. Ribeiro et al (2017) use this approach for clustering news articles.…”
Section: Clusteringmentioning
confidence: 99%
“…Following the task definition of the TDT program, some other researches have been conducted to detect whether new articles in various websites are related to some already identified event, without using the TDT corpus [32]- [37], [238]. For example, Naughton et al [35] proposed to vectorize sentences using the bag-of-words encoding from news articles and clustered sentences via using the agglomerative hierarchical clustering algorithm [239].…”
Section: A Event Mention Detection and Trackingmentioning
confidence: 99%
“…Besides using word and sentence embeddings, some have proposed to exploit several additional information of news articles, like time and location, to augment event mention detection. Ribeiro et al [32] proposed to integrate time, location and content dimensions into text representation, and applied an all pairs similarity search algorithm and a Markov clustering algorithm to cluster news articles describing the same event. Likewise, Yu and Wu [33] proposed to a Time2Vec representation technique that constructs article representation by a context vector and time vector and employed a dual-level clustering algorithm for event detection.…”
Section: A Event Mention Detection and Trackingmentioning
confidence: 99%