Chris van Merwijk scite author profile

Chris van Merwijk

3Publications

28Citation Statements Received

44Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Oxford

Publications

Order By: Most citations

Risks from Learned Optimization in Advanced Machine Learning Systems

Evan¹,

Merwijk²,

Mikulik³

et al. 2019

Preprint

View full text Add to dashboard Cite

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer-a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be-how will it differ from the loss function it was trained under-and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.

show abstract

A Complete Criterion for Value of Information in Soluble Influence Diagrams

Merwijk

Carey

Everitt

2022

AAAI

View full text Add to dashboard Cite

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems. A key building block for this analysis is a graphical criterion for value of information (VoI). This paper establishes the first complete graphical criterion for VoI in influence diagrams with multiple decisions. Along the way, we establish two techniques for proving properties of multi-decision influence diagrams: ID homomorphisms are structure-preserving transformations of influence diagrams, while a Tree of Systems is a collection of paths that captures how information and control can flow in an influence diagram.

show abstract

A Complete Criterion for Value of Information in Soluble Influence Diagrams

Merwijk¹,

Carey²,

Everitt³

2022

Preprint

View full text Add to dashboard Cite

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems. A key building block for this analysis is a graphical criterion for value of information (VoI). This paper establishes the first complete graphical criterion for VoI in influence diagrams with multiple decisions. Along the way, we establish two important techniques for proving properties of multi-decision influence diagrams: ID homomorphisms are structure-preserving transformations of influence diagrams, while a Tree of Systems is a collection of paths that captures how information and control can flow in an influence diagram.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.