Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 2015
DOI: 10.18653/v1/d15-1259
|View full text |Cite
|
Sign up to set email alerts
|

Using Content-level Structures for Summarizing Microblog Repost Trees

Abstract: A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 19 publications
0
19
0
Order By: Relevance
“…We preprocessed the datasets before topic extraction in the following steps: 1) Use FudanNLP toolkit (Qiu et al, 2013) for word segmentation, stop words removal and POS tagging for Chinese Weibo messages; 2) Generate a vocabulary for each dataset and remove words occurring less than 5 times; 3) Remove all hashtags in texts before input them to models, since the models are expected to extract topics without knowing the hashtags, which are ground-truth topics; 4) For LeadLDA, we use the CRF-based leader detection model (Li et al, 2015) to classify messages as leaders and followers. The leader detection model was implemented by using CRF++ 5 , which was trained on the public dataset composed of 1,300 conversation paths and achieved state-of-the-art 73.7% F1score of classification accuracy (Li et al, 2015).…”
Section: Data Collection and Experiments Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…We preprocessed the datasets before topic extraction in the following steps: 1) Use FudanNLP toolkit (Qiu et al, 2013) for word segmentation, stop words removal and POS tagging for Chinese Weibo messages; 2) Generate a vocabulary for each dataset and remove words occurring less than 5 times; 3) Remove all hashtags in texts before input them to models, since the models are expected to extract topics without knowing the hashtags, which are ground-truth topics; 4) For LeadLDA, we use the CRF-based leader detection model (Li et al, 2015) to classify messages as leaders and followers. The leader detection model was implemented by using CRF++ 5 , which was trained on the public dataset composed of 1,300 conversation paths and achieved state-of-the-art 73.7% F1score of classification accuracy (Li et al, 2015).…”
Section: Data Collection and Experiments Setupmentioning
confidence: 99%
“…Conversation tree structures from microblogs have been previously shown helpful to microblog summarization (Li et al, 2015), but have never been explored for topic modeling. We follows Li et al (2015) to detect leaders and followers across paths of conversation trees using Conditional Random Fields (CRF) trained on annotated data. The detected leader/follower information is then incorporated as prior knowledge into our proposed topic model.…”
Section: Introductionmentioning
confidence: 99%
“…The raw Twitter and Reddit data is released by Zeng et al (2018Zeng et al ( , 2019 and both in English. For both Twitter and Reddit, we form the conversations with postings and replies (all the comments and replies also viewed as a single turn) following the practice in Li et al (2015) and Zeng et al (2018).…”
Section: Methodsmentioning
confidence: 99%
“…It is arguable that these methods are suboptimal for recognizing salient content from short and informal messages due to the severe data sparsity problem. Considering that microblogs allow users to form conversations on issues of interests by reposting with comments 2 and replying to messages for voicing opinions on previous discussed points, these conversations can enrich context for short messages (Chang et al, 2013;Li et al, 2015), and have been proven useful for identifying topicrelated content (Li et al, 2016). For example, Table 1 displays a target post with keyphrase "president Duterte" and its reposting and replying messages forming a conversation.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we present a neural keyphrase extraction framework that exploits conversation context, which is represented by neural encoders for capturing salient content to help in indicating keyphrases in target posts. Conversation context has been proven useful in many NLP tasks on social media, such as sentiment analysis (Ren et al, 2016), summarization (Chang et al, 2013;Li et al, 2015), and sarcasm detection (Ghosh et al, 2017). We use four context encoders in our model, namely, averaged embedding, RNN (Pearlmutter, 1989), attention (Bahdanau et al, 2014), and memory networks (Weston et al, 2015), which are proven useful in text representation Weston et al, 2015;Nie et al, 2017).…”
Section: Introductionmentioning
confidence: 99%