Using Content-level Structures for Summarizing Microblog Repost Trees

Li, Jing; Gao, Wei; Wei, Zhongyu; Peng, Baolin; Wong, Kam-Fai

doi:10.18653/v1/d15-1259

Cited by 21 publications

(19 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We preprocessed the datasets before topic extraction in the following steps: 1) Use FudanNLP toolkit (Qiu et al, 2013) for word segmentation, stop words removal and POS tagging for Chinese Weibo messages; 2) Generate a vocabulary for each dataset and remove words occurring less than 5 times; 3) Remove all hashtags in texts before input them to models, since the models are expected to extract topics without knowing the hashtags, which are ground-truth topics; 4) For LeadLDA, we use the CRF-based leader detection model (Li et al, 2015) to classify messages as leaders and followers. The leader detection model was implemented by using CRF++ 5 , which was trained on the public dataset composed of 1,300 conversation paths and achieved state-of-the-art 73.7% F1score of classification accuracy (Li et al, 2015).…”

Section: Data Collection and Experiments Setupmentioning

confidence: 99%

“…Conversation tree structures from microblogs have been previously shown helpful to microblog summarization (Li et al, 2015), but have never been explored for topic modeling. We follows Li et al (2015) to detect leaders and followers across paths of conversation trees using Conditional Random Fields (CRF) trained on annotated data. The detected leader/follower information is then incorporated as prior knowledge into our proposed topic model.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Topic Extraction from Microblog Posts Using Conversation Structures

Li¹,

Liao²,

Gao³

et al. 2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

Conventional topic models are ineffective for topic extraction from microblog messages since the lack of structure and context among the posts renders poor message-level word co-occurrence patterns. In this work, we organize microblog posts as conversation trees based on reposting and replying relations, which enrich context information to alleviate data sparseness. Our model generates words according to topic dependencies derived from the conversation structures. In specific, we differentiate messages as leader messages, which initiate key aspects of previously focused topics or shift the focus to different topics, and follower messages that do not introduce any new information but simply echo topics from the messages that they repost or reply. Our model captures the different extents that leader and follower messages may contain the key topical words, thus further enhances the quality of the induced topics. The results of thorough experiments demonstrate the effectiveness of our proposed model.

show abstract

Section: Data Collection and Experiments Setupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Topic Extraction from Microblog Posts Using Conversation Structures

Li¹,

Liao²,

Gao³

et al. 2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The raw Twitter and Reddit data is released by Zeng et al (2018Zeng et al ( , 2019 and both in English. For both Twitter and Reddit, we form the conversations with postings and replies (all the comments and replies also viewed as a single turn) following the practice in Li et al (2015) and Zeng et al (2018).…”

Section: Methodsmentioning

confidence: 99%

Re-entry Prediction for Online Conversations via Self-Supervised Learning

Wang¹,

Zeng²,

Huang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In recent years, world business in online discussions and opinion sharing on social media is booming. Re-entry prediction task is thus proposed to help people keep track of the discussions which they wish to continue. Nevertheless, existing works only focus on exploiting chatting history and context information, and ignore the potential useful learning signals underlying conversation data, such as conversation thread patterns and repeated engagement of target users, which help better understand the behavior of target users in conversations. In this paper, we propose three interesting and well-founded auxiliary tasks, namely, Spread Pattern, Repeated Target user, and Turn Authorship, as the self-supervised signals for re-entry prediction. These auxiliary tasks are trained together with the main task in a multi-task manner. Experimental results on two datasets newly collected from Twitter and Reddit show that our method outperforms the previous state-of-the-arts with fewer parameters and faster convergence. Extensive experiments and analysis show the effectiveness of our proposed models and also point out some key ideas in designing self-supervised tasks. 1

show abstract

“…It is arguable that these methods are suboptimal for recognizing salient content from short and informal messages due to the severe data sparsity problem. Considering that microblogs allow users to form conversations on issues of interests by reposting with comments 2 and replying to messages for voicing opinions on previous discussed points, these conversations can enrich context for short messages (Chang et al, 2013;Li et al, 2015), and have been proven useful for identifying topicrelated content (Li et al, 2016). For example, Table 1 displays a target post with keyphrase "president Duterte" and its reposting and replying messages forming a conversation.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we present a neural keyphrase extraction framework that exploits conversation context, which is represented by neural encoders for capturing salient content to help in indicating keyphrases in target posts. Conversation context has been proven useful in many NLP tasks on social media, such as sentiment analysis (Ren et al, 2016), summarization (Chang et al, 2013;Li et al, 2015), and sarcasm detection (Ghosh et al, 2017). We use four context encoders in our model, namely, averaged embedding, RNN (Pearlmutter, 1989), attention (Bahdanau et al, 2014), and memory networks (Weston et al, 2015), which are proven useful in text representation Weston et al, 2015;Nie et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts

Yan

Zhang

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

Self Cite

View full text Add to dashboard Cite

Existing keyphrase extraction methods suffer from data sparsity problem when they are conducted on short and informal texts, especially microblog messages. Enriching context is one way to alleviate this problem. Considering that conversations are formed by reposting and replying messages, they provide useful clues for recognizing essential content in target posts and are therefore helpful for keyphrase identification. In this paper, we present a neural keyphrase extraction framework for microblog posts that takes their conversation context into account, where four types of neural encoders, namely, averaged embedding, RNN, attention, and memory networks, are proposed to represent the conversation context. Experimental results on Twitter and Weibo datasets 1 show that our framework with such encoders outperforms state-of-the-art approaches.

show abstract

Using Content-level Structures for Summarizing Microblog Repost Trees

Cited by 21 publications

References 19 publications

Topic Extraction from Microblog Posts Using Conversation Structures

Topic Extraction from Microblog Posts Using Conversation Structures

Re-entry Prediction for Online Conversations via Self-Supervised Learning

Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts

Contact Info

Product

Resources

About