2016
DOI: 10.48550/arxiv.1605.03142
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Modification of Policy and Utility Function in Rational Agents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…(Much like how a human would probably "enjoy" taking addictive substances once they do, but not want to be an addict.) Similar ideas are explored in [50,71].…”
Section: Avoiding Reward Hackingmentioning
confidence: 90%
“…(Much like how a human would probably "enjoy" taking addictive substances once they do, but not want to be an addict.) Similar ideas are explored in [50,71].…”
Section: Avoiding Reward Hackingmentioning
confidence: 90%
“…Corrigibility and self-preservation. TI-unaware agents are weakly corrigible (Everitt, Filan, et al, 2016;Orseau and Armstrong, 2016), as they have no incentive to prevent the designer from updating the reward function. Unfortunately, TI-unaware agents may not be strongly corrigible.…”
Section: Solution 2: Ti-unaware Agentsmentioning
confidence: 99%
“…Previously called corruption aware(Everitt, 2018).12 And others formally verified(Everitt, Filan, et al, 2016;Hibbard, 2012;Orseau and Ring, 2011).13 Previously called corruption unaware(Everitt, 2018).…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Bayesian, history-based agents have been used to formalize AGI in the so-called AIXIframework (Hutter, 2005; also discussed in Section 2.1). Extensions of this framework have been developed for studying goal alignment (Everitt and Hutter, 2018a), multi-agent interaction (Leike, Taylor, et al, 2016), space-time embeddedness (Orseau and Ring, 2012), self-modification (Everitt, Filan, et al, 2016;Orseau and Ring, 2011), observation modification (Ring and Orseau, 2011), self-duplication (Orseau, 2014a,b), knowledge seeking (Orseau, 2014c), decision theory (Everitt, Leike, et al, 2015), and others (Everitt and Hutter, 2018b). Some aspects of reasoning are swept under the rug by AIXI and Bayesian optimality.…”
Section: Formalizing Agimentioning
confidence: 99%