An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis

Miller, James D.; Yampolskiy, Roman V.; Häggström, Olle

doi:10.3390/philosophies5040040

Cited by 7 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…History is replete with examples of 'power corrupts and absolute power corrupts absolutely' as well as singleton coercive government going wrong. The methods presented here are envisioned in an AGI society with specialized labor and voluntary, negotiated interactions that include permission to alter societal shared values that impact individual utility functions in order to prevent utility function changes from threatening present and future societies [10,14].…”

Section: 'Hard Take-off' and Automated Agi Governmentmentioning

confidence: 99%

“…As just alluded, a single heuristic, such as 'terminate all humans', or ethic, such as 'terminate all agents using resources inefficiently as defined by the following metric', added to a BCS could result in realization of the AGI existential threat, as could universal drives causing AGI to alter its utility function [14]. Thus, any alteration, especially forgery, of ethics modules or BCS must be detected.…”

Section: Detection Of Behavior Control System (Bcs) Forgery Via Acyclic Graphsmentioning

confidence: 99%

“…The ability of AGI to self-correct or to assist its designers in correction of value alignment and behavior is called 'corrigibility' by Soares [21]. Miller et al review and examine how corrigibility can result in mis-alignment of values [14].…”

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

“…A systematic procedure could be followed as in Carlson ([10],Methods Sec. 2.1), testing BCS to flag and eliminate the possibility of behavioral pathways to dangerous AGI taken from enumerations given by, e.g., Asimov [60], Turchin [6], Bostrom [3], Yampolskiy [2], Tegmark [20]), and Miller et al [14].…”

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

See 3 more Smart Citations

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson¹

2021

Preprint

View full text Add to dashboard Cite

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn≪AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2-100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn↔AGIn+1 interaction hazards to an acceptably low level.

show abstract

Section: 'Hard Take-off' and Automated Agi Governmentmentioning

confidence: 99%

Section: Detection Of Behavior Control System (Bcs) Forgery Via Acyclic Graphsmentioning

confidence: 99%

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

See 2 more Smart Citations

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…As just alluded, a single heuristic, such as 'terminate all humans', or ethic, such as 'terminate all agents using resources inefficiently as defined by the following metric', added to a BCS could result in realization of the AGI existential threat, as could universal drives, such as simply wanting to improve its ability to achieve goals, causing AGI to alter its utility function [50]. Thus, any alteration, especially forgery, of ethics modules or BCS must be detected.…”

Section: Detection Of Behavior Control System (Bcs) Forgery Via Acyclic Graphsmentioning

confidence: 99%

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson

2021

Philosophies

View full text Add to dashboard Cite

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.

show abstract

AI Risk Skepticism

Yampolskiy

2022

Studies in Applied Philosophy, Epistemology and Rational Ethics

View full text Add to dashboard Cite

An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis

Cited by 7 publications

References 12 publications

Provably Safe Artificial General Intelligence via Interactive Proofs

Provably Safe Artificial General Intelligence via Interactive Proofs

Provably Safe Artificial General Intelligence via Interactive Proofs

AI Risk Skepticism

Contact Info

Product

Resources

About