Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2015
DOI: 10.18653/v1/w15-4640
|View full text |Cite
|
Sign up to set email alerts
|

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Abstract: This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog service… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
924
0
5

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 696 publications
(931 citation statements)
references
References 26 publications
2
924
0
5
Order By: Relevance
“…The human results are separated into AMT non-experts, consisting of paid respondents who have 'Beginner' or no knowledge of Ubuntu terminology; AMT experts, who claimed to have 'Intermediate' or 'Advanced' knowledge of Ubuntu; and Lab experts, who are the non-paid respondents with Ubuntu experience and university-level computer science training. We also presents results on the same task for a state-of-the-art artificial neural (Lowe et al, 2015a) Table 2: Average results on each corpus. 'Number of Users' indicates the number of respondents for each category.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The human results are separated into AMT non-experts, consisting of paid respondents who have 'Beginner' or no knowledge of Ubuntu terminology; AMT experts, who claimed to have 'Intermediate' or 'Advanced' knowledge of Ubuntu; and Lab experts, who are the non-paid respondents with Ubuntu experience and university-level computer science training. We also presents results on the same task for a state-of-the-art artificial neural (Lowe et al, 2015a) Table 2: Average results on each corpus. 'Number of Users' indicates the number of respondents for each category.…”
Section: Resultsmentioning
confidence: 99%
“…the actual response of the conversation) that are found in the top k responses with the highest rankings according to the model. This task has gained some popularity recently for evaluating dialogue systems (Lowe et al, 2015a;Kadlec et al, 2015).…”
Section: Technical Background On Nucmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, after a fully connected layer and cross-entropy loss, a two-category model of matching degree between text and abstract is obtained. This model is based on the simple improvement of the Dual-LSTM model [12] in the Q & A retrieval field, which is better than the direct application of the text digest to match degree, both accuracy and recall. (3) First, the two-category matching model for text and abstract is used to predict the score of the first part of the LSCTS data set and a positive sample with a score of more than 0.95 and a negative sample of less than 0.1 are taken out and added to the training set again.…”
Section: Advances In Intelligent Systems Research Volume 147mentioning
confidence: 99%
“…After this, it finds a list of possible synonyms for the word according to its POS. At the end, it checks which word exists in the vocabulary, and according to that, it alters the sentence [15]. For all those words whose synonym does not exist or whose synonyms are not in the vocabulary are left unchanged.…”
Section: Synonyms Findermentioning
confidence: 99%