2021
DOI: 10.48550/arxiv.2110.01895
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

Abstract: Recently, there is a surge of interest in applying pre-trained language models (Pr-LM) in automatic open-domain dialog evaluation. Pr-LMs offer a promising direction for addressing the multi-domain evaluation challenge. Yet, the impact of different Pr-LMs on the performance of automatic metrics is not wellunderstood. This paper examines 8 different Pr-LMs and studies their impact on three typical automatic dialog evaluation metrics across three different dialog evaluation benchmarks. Specifically, we analyze h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…We adopt the RoBERTa-base model [20] as the pretrained transformer encoders in PoE and baselines.This is because RoBERTa has been proven as a powerful text encoder that are beneficial for the automatic dialogue evaluation task in prior works [7], [11], [66], [75]. In addition, we want to have a fair comparison with the existing state-of-the-art metrics, which use either BERT-base or RoBERTa-base except DEB, which is based on BERT-large [19].…”
Section: Experiments Setupmentioning
confidence: 99%
“…We adopt the RoBERTa-base model [20] as the pretrained transformer encoders in PoE and baselines.This is because RoBERTa has been proven as a powerful text encoder that are beneficial for the automatic dialogue evaluation task in prior works [7], [11], [66], [75]. In addition, we want to have a fair comparison with the existing state-of-the-art metrics, which use either BERT-base or RoBERTa-base except DEB, which is based on BERT-large [19].…”
Section: Experiments Setupmentioning
confidence: 99%