2022
DOI: 10.48550/arxiv.2201.06219
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets

Abstract: Open-domain dialogue systems aim to converse with humans through text, and its research has heavily relied on benchmark datasets. In this work, we first identify the overlapping problem in DailyDialog and OpenSubtitles, two popular open-domain dialogue benchmark datasets. Our systematic analysis then shows that such overlapping can be exploited to obtain fake state-of-the-art performance. Finally, we address this issue by cleaning these datasets and setting up a proper data processing procedure for future rese… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 22 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?