Training an adaptive dialogue policy for interactive learning of
            visually grounded word meanings

Yu, Yanchao; Eshghi, Arash; Lemon, Oliver

doi:10.18653/v1/w16-3643

Cited by 17 publications

(34 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accuracy (%) KLD Utterance-level 77.98 0.2338 Act-level 84.96 0.188 In order to demonstrate how the BURCHAK corpus can be used, we train and evaluate a prototype interactive learning agent using Reinforcement Learning (RL) on the collected data. We follow previous task and experiment settings (see (Yu et al, 2016b;Yu et al, 2016c)) to compare the learned RL-based agent with a rule-based agent with the best performance from previous work. Instead of using hand-crafted dialogue examples as before, here we train the RL agent in interaction with the user simulation, itself trained from the BURCHAK data as above.…”

Section: Simulationmentioning

confidence: 99%

The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

Yu¹,

Eshghi²,

Mills

et al. 2017

Proceedings of the Sixth Workshop on Vision and Language

Self Cite

View full text Add to dashboard Cite

We motivate and describe a new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as "burchak" for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rulebased system built previously.

show abstract

Section: Simulationmentioning

confidence: 99%

The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

Yu¹,

Eshghi²,

Mills

et al. 2017

Proceedings of the Sixth Workshop on Vision and Language

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our work is similar in spirit to e.g. (Roy, 2002;Skocaj et al, 2011) but advances it in several aspects (Yu et al, 2016).…”

Section: Introductionmentioning

confidence: 78%

“…Following previous work (Yu et al, 2016), here we use a positive confidence threshold, which determines when the agent believes its own predictions. For instance, the learner can ask either polar or WH-questions about an attribute if its confidence score is higher than a certain threshold; otherwise, there should be no interaction about that attribute.…”

Section: When To Learn: Adaptive Confidence Thresholdmentioning

confidence: 99%

VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System)

Eshghi

Lemon

2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Self Cite

View full text Add to dashboard Cite

We present VOILA: an optimised, multimodal dialogue agent for interactive learning of visually grounded word meanings from a human user. VOILA is: (1) able to learn new visual categories interactively from users from scratch; (2) trained on real human-human dialogues in the same domain, and so is able to conduct natural spontaneous dialogue; (3) optimised to find the most effective trade-off between the accuracy of the visual categories it learns and the cost it incurs to users. VOILA is deployed on Furhat 1 , a humanlike, multi-modal robot head with backprojection of the face, and a graphical virtual character.

show abstract

“…On the other hand, other models assume a much more explicit connection between symbols (either words or predicate symbols of some logical language) and perceptions (Kennington and Schlangen, 2015;Yu et al, 2016c;Skocaj et al, 2016;Dobnik et al, 2014;Matuszek et al, 2014). In this line of work, representations are both compositional and transparent, with their constituent atomic parts grounded individually in perceptual classifiers.…”

Section: Related Workmentioning

confidence: 99%

Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings

Yu¹,

Eshghi²,

Lemon³

2017

Proceedings of the First Workshop on Language Grounding for Robotics

Self Cite

View full text Add to dashboard Cite

We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users, and achieve good learning performance (i.e. accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpusa Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.

show abstract

Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

Cited by 17 publications

References 28 publications

The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System)

Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings

Contact Info

Product

Resources

About