Daniel Filan scite author profile

Daniel Filan

5Publications

49Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

Australian National University

Publications

Order By: Most citations

Self-Modification of Policy and Utility Function in Rational Agents

Everitt

Filan

Daswani

et al. 2016

View full text Add to dashboard Cite

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby 'escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

show abstract

Self-Modification of Policy and Utility Function in Rational Agents

Everitt¹,

Filan²,

Daswani³

et al. 2016

Preprint

View full text Add to dashboard Cite

Pruned Neural Networks are Surprisingly Modular

Filan¹,

Hod²,

Wild³

et al. 2020

Preprint

View full text Add to dashboard Cite

Clusterability in Neural Networks

Filan¹,

Casper²,

Hod³

et al. 2021

Preprint

View full text Add to dashboard Cite

The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.

show abstract

Quantifying Local Specialization in Deep Neural Networks

Hod¹,

Filan²,

Casper³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Filan

Self-Modification of Policy and Utility Function in Rational Agents

Self-Modification of Policy and Utility Function in Rational Agents

Pruned Neural Networks are Surprisingly Modular

Clusterability in Neural Networks

Quantifying Local Specialization in Deep Neural Networks

Contact Info

Product

Resources

About