DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Zhong, Ming; Liu, Yang; Xu, Yichong; Zhu, Chenguang; Zeng, Michael

doi:10.1609/aaai.v36i10.21432

Cited by 53 publications

(31 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the introduction of BERT (Devlin et al, 2019), the research community has witnessed remarkable progress in the field of language model pre-training on a large amount of free text. Such advancements have led to significant progresses in a wide range of natural language understanding (NLU) tasks Yang et al, 2019;Clark et al, 2020;Lan et al, 2021) and text generation tasks (Radford et al, 2019;Lewis et al, 2020;Raffel et al, 2020;Su et al, 2021a,e,g,d,f,c;Zhong et al, 2021) Contrastive Learning. Generally, contrastive learning methods distinguish observed data points from fictitious negative samples.…”

Section: B Related Workmentioning

confidence: 99%

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Liu

Meng

et al. 2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

Masked language models (MLMs) such as BERT have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach. 1 * Work was done prior to joining Amazon. 1 Our code and pre-trained models are publicly available at https://github.com/yxuansu/TaCL

show abstract

Section: B Related Workmentioning

confidence: 99%

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Liu

Meng

et al. 2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

show abstract

“…Recent work on QMSum has introduced taskspecific denoising objectives for meeting summarization (Zhong et al, 2021a), generated final fine-grained summaries based on multiple coarsegrained steps (Zhang et al, 2021a), and treated the extractive text of an extractive-abstractive model as a latent variable (Mao et al, 2021). Zhang et al (2021b) analyze the challenges of long dialogue summarization such as the input length, the role of queries, and domain adaptation.…”

Section: Query-focused Summarizationmentioning

confidence: 99%

Exploring Neural Models for Query-Focused Summarization

Vig¹,

Fabbri²,

Kryściński³

et al. 2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. While recently released datasets, such as QMSum or AQua-MuSe, facilitate research efforts in QFS, the field lacks a comprehensive study of the broad space of applicable modeling methods. In this paper we conduct a systematic exploration of neural approaches to QFS, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models. Within those categories, we investigate existing models and explore strategies for transfer learning. We also present two modeling extensions that achieve state-of-the-art performance on the QMSum dataset, up to a margin of 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L when combined with transfer learning strategies. Results from human evaluation suggest that the best models produce more comprehensive and factuallyconsistent summaries compared to a baseline model. Code and checkpoints are made publicly available: https://github.com/ salesforce/query-focused-sum.

show abstract

“…For instance, BART proposes a full-text denoising pre-training objective for seq2seq models, whose cost is prohibitive for long text, and not tailored to dialogues. Zhong et al (2022) present the DialogLM model, along with a dialoguededicated, window-based denoising pretraining approach. Windows of consecutive utterances are first selected from the conversation, which is then disrupted with arbitrary dialogue-related noises, e.g., speaker mask, turn splitting, turn merging, text infilling and turn permutation.…”

Section: Domain Adaptationmentioning

confidence: 99%

“…As mentioned in Subsection 4.3, pretrained on free-form text makes these models difficult to process specifically structured dialogue transcriptions. Future research could follow the work of, notably DialoGPT (Zhang et al, 2020) and Di-alogLM (Zhong et al, 2022), to develop better languge models dedicated to spoken language, which will potentially offer large gains in task performance of abstractive meeting summarization. Commonsense incorporation.…”

Section: Future Directionsmentioning

confidence: 99%

Abstractive Meeting Summarization: A Survey

Rennard¹,

Shang²,

Hunter³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved the performance of abstractive summarization systems. While the majority of research has focused on written documents, we have observed an increasing interest in the summarization of dialogues and multi-party conversation over the past few years. A system that could reliably transform the audio or transcript of a human conversation into an abridged version that homes in on the most important points of the discussion would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. This paper focuses on abstractive summarization for multi-party meetings, providing a survey of the challenges, datasets and systems relevant to this task and a discussion of promising directions for future study.

show abstract

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

Cited by 53 publications

References 43 publications

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Exploring Neural Models for Query-Focused Summarization

Abstractive Meeting Summarization: A Survey

Contact Info

Product

Resources

About