Lasguido NIO†a) , Sakriani SAKTI †b) , Graham NEUBIG †c) , Koichiro YOSHINO †d) , Nonmembers, and Satoshi NAKAMURA †e) , Member
SUMMARYIn this work, we propose a new statistical model for building robust dialog systems using neural networks to either retrieve or generate dialog response based on an existing data sources. In the retrieval task, we propose an approach that uses paraphrase identification during the retrieval process. This is done by employing recursive autoencoders and dynamic pooling to determine whether two sentences with arbitrary length have the same meaning. For both the generation and retrieval tasks, we propose a model using long short term memory (LSTM) neural networks that works by first using an LSTM encoder to read in the user's utterance into a continuous vector-space representation, then using an LSTM decoder to generate the most probable word sequence. An evaluation based on objective and subjective metrics shows that the new proposed approaches have the ability to deal with user inputs that are not well covered in the database compared to standard example-based dialog baselines. key words: example-based dialog system, dialog system, response retrieval, response generation, long short term memory neural network
IntroductionNatural language dialogue systems promise to establish efficient interfaces for communication between humans and computers [1]- [5]. One way to create a simple yet effective dialog system is using example-based dialog modeling (EBDM) [6]-[9]. EBDM is a data-driven approach for creating dialog systems that choose how to respond to user input based on a large database of examples consisting of an utterance, and a corresponding natural reply to that utterance. Given a user input, the system then performs response retrieval, selecting the highest scoring response from the existing utterances in the database. EBDM presents a lightweight alternative to more conventional methods for constructing dialog systems, as it only requires the construction of an example base, and has also been shown effective in a number of dialog scenarios. In particular, this approach is able to generate highly natural output when an example that matches closely with the user query is included in the database and the example is appropriately retrieved [6] However, dealing with sparse human language and a finite query-response database, we can imagine easily that such system may fail when attempting to respond to a user utterance that does not match closely with one of the examples in the database. We define this kind of problem as an out of example (OOE) problem. One way to overcome this problem is using response generation. This approach uses the dialog examples as data to train a model that can generate responses not included in the database. Generation has the potential to be more robust to OOE user inputs, but also may generate responses that are incomprehensible to human users [13]. Generation models originally adapted statistical machine translation (SMT) to utilize a query-response dialog ...