2019
DOI: 10.48550/arxiv.1902.07685
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

World Discovery Models

Abstract: As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information-humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural D… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…In RL, this intuition is transposed by taking the current state and action to predict the next state representation; the resulting prediction error is then turned into the intrinsic reward signal. Approaches mostly differ in learning the state representation: Finally, Achiam and Sastry (2017); Azar et al (2019) computes the intrinsic reward across multiple timesteps predictions to better estimate information gain. Yet, those intrinsic rewards based on prediction errors may attract the agent into irrelevant yet unpredictable transitions.…”
Section: Related Workmentioning
confidence: 99%
“…In RL, this intuition is transposed by taking the current state and action to predict the next state representation; the resulting prediction error is then turned into the intrinsic reward signal. Approaches mostly differ in learning the state representation: Finally, Achiam and Sastry (2017); Azar et al (2019) computes the intrinsic reward across multiple timesteps predictions to better estimate information gain. Yet, those intrinsic rewards based on prediction errors may attract the agent into irrelevant yet unpredictable transitions.…”
Section: Related Workmentioning
confidence: 99%
“…Information gain can be estimated by planning (Sun et al, 2011) or from past environment interaction (Schmidhuber, 1991). State representations lead to agents that disambiguate unobserved environment states, for example by opening doors to see objects behind them, such as in active inference (Da Costa et al, 2020), INDIGO (Azar et al, 2019), and DVBF-LM (Mirchev et al, 2018). Model parameters lead to agents that discover the rules of their environment, such as in active inference (Friston et al, 2015), VIME (Houthooft et al, 2016), MAX (Shyam et al, 2018), and Plan2Explore (Sekar et al, 2020).…”
Section: Information Gainmentioning
confidence: 99%
“…Neural population decoding. Probing an agent with a QA decoder can be viewed as a variant of neural population decoding, used as an analysis tool in neuroscience (Georgopoulos et al, 1986;Bialek et al, 1991;Salinas & Abbott, 1994) and more recently in deep learning (Guo et al, 2018;Gregor et al, 2019;Azar et al, 2019;Alain & Bengio, 2016;Conneau et al, 2018;Tenney et al, 2019). The idea is to test whether specific information is encoded in a learned representation, by feeding the representation as input to a probe network, generally a classifier trained to extract the desired information.…”
Section: Background and Related Workmentioning
confidence: 99%