An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

Whang, Taesun; Lee, Dongyub; Lee, Chanhee; Yang, Kisu; Oh, Dongsuk; Lim, Heuiseok

doi:10.48550/arxiv.1908.04812

Cited by 20 publications

(34 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We fine-tune BERT (Devlin et al, 2019) (bert-basecased) for conversation response ranking using the huggingface-transformers (Wolf et al, 2019). We follow recent research in IR that employed finetuned BERT for retrieval tasks (Nogueira and Cho, 2019;, including conversation response ranking (Penha and Hauff, 2020;Vig and Ramea, 2019;Whang et al, 2019). When training BERT we employ a balanced number of relevant and non-relevant-sampled using BM25 (Robertson and Walker, 1994)-context and response pairs.…”

Section: Implementation Detailsmentioning

confidence: 99%

On the Calibration and Uncertainty of Neural Learning to Rank Models

Penha,

Hauff

2021

Preprint

View full text Add to dashboard Cite

According to the Probability Ranking Principle (PRP), ranking documents in decreasing order of their probability of relevance leads to an optimal document ranking for ad-hoc retrieval. The PRP holds when two conditions are met:[C1] the models are well calibrated, and, [C2] the probabilities of relevance are reported with certainty. We know however that deep neural networks (DNNs) are often not well calibrated and have several sources of uncertainty, and thus [C1] and [C2] might not be satisfied by neural rankers. Given the success of neural Learning to Rank (L2R) approaches-and here, especially BERT-based approaches-we first analyze under which circumstances deterministic, i.e. outputs point estimates, neural rankers are calibrated. Then, motivated by our findings we use two techniques to model the uncertainty of neural rankers leading to the proposed stochastic rankers, which output a predictive distribution of relevance as opposed to point estimates. Our experimental results on the ad-hoc retrieval task of conversation response ranking 1 reveal that (i) BERT-based rankers are not robustly calibrated and that stochastic BERT-based rankers yield better calibration; and (ii) uncertainty estimation is beneficial for both risk-aware neural ranking, i.e. taking into account the uncertainty when ranking documents, and for predicting unanswerable conversational contexts.system for documents that are likely relevant (Baeza-Yates et al., 1999).

show abstract

Section: Implementation Detailsmentioning

confidence: 99%

On the Calibration and Uncertainty of Neural Learning to Rank Models

Penha,

Hauff

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Generally, most models formulate the response selection task as a dialog-response binary classification task. Whang et al (2019) first applied BERT for multi-turn response selection and obtained state-of-the-art results through further training BERT on domain-specific corpus. Subsequent researches (Lu et al, 2020;Gu et al, 2020) focused on modeling speaker information and showed its effectiveness in response retrieval.…”

Section: Related Workmentioning

confidence: 99%

“…This has shown to be effective in various tasks including review reading comprehension (Xu et al, 2019) and Su-perGLUE (Wang et al, 2019). Existing works on multi-turn response selection (Whang et al, 2019;Gu et al, 2020;Humeau et al, 2020) also adapted this post-training approach and obtained state-ofthe-art results. We also employ this post-training method in this work and show its effectiveness in improving performance (Section 5.1).…”

Section: Domain-specific Post-trainingmentioning

confidence: 99%

“…Existing works (Wu et al, 2017;Zhou et al, 2018;Tao et al, 2019a;Yuan et al, 2019) have studied utterance-response matching based on attention mechanisms including self-attention (Vaswani et al, 2017). Most recently, as pre-trained language models (e.g., BERT (Devlin et al, 2019), RoBERTa , and ELECTRA (Clark et al, 2020)) have achieved substantial improvements in performance in diverse NLP tasks, multiturn response selection also has been resolved by using such language models (Whang et al, 2019;Lu et al, 2020;Gu et al, 2020;Humeau et al, 2020).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Whang¹,

Lee²,

Oh³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we study the task of selecting optimal response given user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) have shown significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating them as dialog-response binary classification tasks. Although existing works using this approach successfully obtained stateof-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient in learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which lead to models pushing the state-of-theart with significant margins on multiple public benchmark datasets.

show abstract

“…There are two types of formulation of next-utterance prediction. The first one is to generate the next utterance in a conversation given the conversation history (Zhao and Kawahara 2019;Dziri et al 2018;Hu et al 2019) and the second type is to retrieve the next utterance in the conversation from a large list of response utterances (Lowe et al 2015;Whang et al 2019). These tasks are useful for building a chatbot, aiming to generate or retrieve a good response according to the context of the conversation, while this work aims to recover the structure of the entire conversation.…”

Section: Related Workmentioning

confidence: 99%

Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer

Zhu¹,

Feng²,

Wang³

et al. 2019

Preprint

View full text Add to dashboard Cite

Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utterances to decide whether one utterance is the parent of the other. We believe the entire ancestral history is a very important information source to make accurate prediction. Therefore, we design a novel masking mechanism to guide the ancestor flow, and leverage the transformer model to aggregate all ancestors to predict parent utterances. Our experiments are performed on the Reddit dataset (Zhang, Culbertson, and Paritosh 2017) and the Ubuntu IRC dataset (Kummerfeld et al. 2019). In addition, we also report experiments on a new larger corpus from the Reddit platform and release this dataset. We show that the proposed model, that takes into account the ancestral history of the conversation, significantly outperforms several strong baselines including the BERT model on all datasets.

show abstract

An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

Cited by 20 publications

References 14 publications

On the Calibration and Uncertainty of Neural Learning to Rank Models

On the Calibration and Uncertainty of Neural Learning to Rank Models

Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer

Contact Info

Product

Resources

About