2018
DOI: 10.1007/978-3-319-91662-0_8
|View full text |Cite
|
Sign up to set email alerts
|

Effective Crowdsourced Generation of Training Data for Chatbots Natural Language Understanding

Abstract: Chatbots are text-based conversational agents. Natural Language Understanding (NLU) models are used to extract meaning and intention from user messages sent to chatbots. The user experience of chatbots largely depends on the performance of the NLU model, which itself largely depends on the initial dataset the model is trained with. The training data should cover the diversity of real user requests the chatbot will receive. Obtaining such data is a challenging task even for big corporations. We introduce a gene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…We based our crowdsourcing process upon the generic process for generating chatbot training data introduced by Bapat et al [3]. It comprises three main steps: 1) Preparatory work: clarify use cases for our medication chatbot and create entity-intent model, 2) Creation of orders in the crowdsourcing platform, 3) Collect and control the data.…”
Section: Methodsmentioning
confidence: 99%
“…We based our crowdsourcing process upon the generic process for generating chatbot training data introduced by Bapat et al [3]. It comprises three main steps: 1) Preparatory work: clarify use cases for our medication chatbot and create entity-intent model, 2) Creation of orders in the crowdsourcing platform, 3) Collect and control the data.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, capitalization and article errors seems abundant (e.g., Example 5 in Table 1). Given that real bot users also make such errors, it is important to have linguistically incorrect utterances in the training samples (Bapat et al, 2018). However, at a very least, detecting linguistic errors can contribute to quality-aware selection of crowd workers.…”
Section: Linguistic Errorsmentioning
confidence: 99%
“…In some cases, workers misunderstood the task and provided translations in their own native languages (referred to as Translation issues) (Crossley et al, 2016;Braunger et al, 2018;Bapat et al, 2018), and some mistakenly thought they should provide answers for expressions phrased as questions (referred to as Answering issues) such as Example 9 in Table 1. This occurred even though workers were provided with comprehensive instructions and examples.…”
Section: Task Misunderstandingmentioning
confidence: 99%
“…Bapat et al [2] already presented an end-to-end pipeline for simplifying the NLU training process, where the first sentences are defined and extended for the following training. While the extension of the training dataset is skipped and only classified into 5 categories of possible extension methods, our approach mainly targets the class of generating big pools of parameter values.…”
Section: Related Workmentioning
confidence: 99%