This article describes a new tool for extracting question-answer pairs from text
articles, and reports three experiments which investigate how suitable this technique is
for supplying knowledge to conversational characters. Experiment 1 demonstrates the
feasibility of our method by creating characters for 14 distinct topics and evaluating
them using hand-authored questions. Experiment 2 evaluates three of these characters
using questions collected from naive participants, showing that the generated characters
provide full or partial answers to about half of the questions asked. Experiment 3 adds
automatically extracted knowledge to an existing, hand-authored character, demonstrating
that augmented characters can answer questions about new topics but with some
degradation of the ability to answer questions about topics that the original character
was trained to answer. Overall, the results show that question generation is a promising
method for creating or augmenting a question answering conversational character using an
existing text.
Deep reinforcement learning (RL) methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re‐examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on‐policy, off‐policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on‐policy states, even though those states are not selected adversarially. We focus our analyses on the deep Q‐networks (DQNs) that kicked off the modern era of deep RL. Taken together, these results call into question the extent to which DQNs learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.
Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing highquality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.