A central goal in systems neuroscience is to understand the functions performed by neural circuits. Previous top-down models addressed this question by comparing the behaviour of an ideal model circuit, optimised to perform a given function, with neural recordings. However, this requires guessing in advance what function is being performed, which may not be possible for many neural systems. To address this, we propose a new framework for optimising a recurrent network using multi-agent reinforcement learning (RL). In this framework, a reward function quantifies how desirable each state of the network is for performing a given function. Each neuron is treated as an 'agent', which optimises its responses so as to drive the network towards rewarded states. Three applications follow from this. First, one can use multi-agent RL algorithms to optimise a recurrent neural network to perform diverse functions (e.g. efficient sensory coding or motor control). Second, one could use inverse RL to infer the function of a recorded neural network from data. Third, the theory predicts how neural networks should adapt their dynamics to maintain the same function when the external environment or network structure changes. This could lead to theoretical predictions about how neural network dynamics adapt to deal with cell death and/or varying sensory stimulus statistics.February 6, 2020 1/24 environment [4], or memory storage [5]. Nevertheless, it has remained difficult to make quantitative contact between top-down model predictions and data, in particular, to rigorously test which (if any) of the proposed functions is actually being carried out by a real neural circuit. The first problem is that a pure top-down approach requires us to hypothesise the function performed by a given neural circuit, which is often not possible. Second, even if our hypothesis is correct, there may be multiple ways for a neural circuit to perform the same function, so that the predictions of the top-down model may not match the data.Here we propose a new framework for considering optimal coding by a recurrent neural network, that aims to overcome these problems. First, we show how optimal coding by a recurrent neural network can be re-cast as a multi-agent reinforcement learning (RL) problem [14-18] (Fig 1). In this framework, a reward function quantifies how desirable each state of the network is for performing a given computation. Each neuron is then treated as a separate 'agent', which optimises its responses (i.e. when to fire a spike) so as to drive the network towards rewarded states, given a constraint on the information each neuron encodes about its inputs. This framework is very generaldifferent choices of reward function result in the network performing diverse functions, from efficient coding to decision making and optimal control -and thus has the potential to unify many previous theories of neural coding.Next, we show how our proposed framework could be used to tackle the inverse problem, of inferring the reward function from the observed n...