Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1483
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Clustering of Streaming News

Abstract: Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing nu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
58
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(61 citation statements)
references
References 17 publications
1
58
0
Order By: Relevance
“…The task of identifying and tracking events was first introduced in the Topic Detection and Tracking challenge (Allan et al, 1998). Recent work has explored new methods for tracking and visualizing such events over time (e.g., (Laban and Hearst, 2017;Miranda et al, 2018;Staykovski et al, 2019;Saravanakumar et al, 2021)), in some cases generating summaries that contain information on what is new (e.g., (Kedzie et al, 2015(Kedzie et al, , 2018) and in other cases, exploring timeline summarization, ordering events and generating summaries that are placed along a timeline (e.g., (Wang et al, 2015;Binh Tran et al, 2013;Nguyen et al, 2014)) We will also consider how these are related to summarization of an event that takes place within a single day, a problem that falls within the category of multidocument summarization (e.g., (Liu and Lapata, 2019;Fabbri et al, 2019)), as typically there may be many articles covering the same event. By using multiple articles as input, a summarizer can present different perspectives on the same event as well as identify salient information that is highlighted many in different ways across the set of input articles.…”
Section: Event Summarization [30min]mentioning
confidence: 99%
“…The task of identifying and tracking events was first introduced in the Topic Detection and Tracking challenge (Allan et al, 1998). Recent work has explored new methods for tracking and visualizing such events over time (e.g., (Laban and Hearst, 2017;Miranda et al, 2018;Staykovski et al, 2019;Saravanakumar et al, 2021)), in some cases generating summaries that contain information on what is new (e.g., (Kedzie et al, 2015(Kedzie et al, , 2018) and in other cases, exploring timeline summarization, ordering events and generating summaries that are placed along a timeline (e.g., (Wang et al, 2015;Binh Tran et al, 2013;Nguyen et al, 2014)) We will also consider how these are related to summarization of an event that takes place within a single day, a problem that falls within the category of multidocument summarization (e.g., (Liu and Lapata, 2019;Fabbri et al, 2019)), as typically there may be many articles covering the same event. By using multiple articles as input, a summarizer can present different perspectives on the same event as well as identify salient information that is highlighted many in different ways across the set of input articles.…”
Section: Event Summarization [30min]mentioning
confidence: 99%
“…The model has state-of-the-art performance, on the test partition of the corpus from Miranda et al (2018): F 1 = 98.11 and F 1 BCubed = 94.41 (F 1 BCubed is an evaluation measure specifically designed to evaluate clustering algorithms (Amigó et al, 2009)). As a comparison, the best model in (Miranda et al, 2018) has F 1 = 94.1 (see Staykovski et al (2019) for further details). In Figure 2 the article from the Sputnik is flagged as likely to be propagandistic by our system.…”
Section: Event Identification / Clusteringmentioning
confidence: 99%
“…We selected all parameters empirically on the training part of the corpus from (Miranda et al, 2018). The sequence of overlapping local graphs is merged in the order of their creation, thus generating stories from the topics.…”
Section: Event Identification / Clusteringmentioning
confidence: 99%
See 2 more Smart Citations