Task-oriented virtual assistants (or simply chatbots) are in very high demand these days. They employ third-party APIs to serve end-users via natural language interactions. Chatbots are famed for their easy-to-use interface and gentle learning curve (it only requires one of humans' most innate ability, the use of natural language). Studies on human conversation patterns show, however, that day-today dialogues are of multi-turn and multi-intent nature, which pushes the need for chatbots that are more resilient and flexible to this style of conversations. In this paper, we propose the idea of leveraging Conversational State Machine to make it a core part of chatbots' conversation engine by formulating conversations as a sequence of states. Here, each state covers an intent and contains a nested state machine to help manage tasks associated to the conversation intent. Such enhanced conversation engine, together with a novel technique to spot implicit information from dialogues (by exploiting Dialog Acts), allows chatbots to manage tangled conversation situations where most existing chatbot technologies fail.
Building task-oriented bots requires mapping a user utterance to an intent with its associated entities to serve the request. Doing so is not easy since it requires large quantities of high-quality and diverse training data to learn how to map all possible variations of utterances with the same intent. Crowdsourcing may be an effective, inexpensive, and scalable technique for collecting such large datasets. However, the diversity of the results su ers from the priming e ect (i.e. workers are more likely to use the words in the sentence we are asking to paraphrase). In this paper, we leverage priming as an opportunity rather than a threat: we dynamically generate word suggestions to motivate crowd workers towards producing diverse utterances. The key challenge is to make suggestions that can improve diversity without resulting in semantically invalid paraphrases. To achieve this, we propose a probabilistic model that generates continuously improved versions of word suggestions that balance diversity and semantic relevance. Our experiments show that the proposed approach improves the diversity of crowdsourced paraphrases.
Developing bots demands high quality training samples, typically in the form of user utterances and their associated intents. Given the fuzzy nature of human language, such datasets ideally must cover all possible utterances of each single intent. Crowdsourcing has widely been used to collect such inclusive datasets by paraphrasing an initial utterance. However, the quality of this approach often suffers from various issues, particularly language errors produced by unqualified crowd workers. More so, since workers are tasked to write open-ended text, it is very challenging to automatically asses the quality of paraphrased utterances. In this paper, we investigate common crowdsourced paraphrasing issues, and propose an annotated dataset called Para-Quality, for detecting the quality issues. We also investigate existing tools and services to provide baselines for detecting each category of issues. In all, this work presents a data-driven view of incorrect paraphrases during the bot development process, and we pave the way towards automatic detection of unqualified paraphrases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.