Research on both natural intelligence (NI) and artificial intelligence (AI) generally assumes that the future resembles the past: intelligent agents or systems (what we call 'intelligence') observe and act on the world, then use this experience to act on future experiences of the same kind. We call this 'retrospective learning'. For example, an intelligence may see a set of pictures of objects, along with their names, and learn to name them. A retrospective learning intelligence would merely be able to name more pictures of the same objects. We argue that this is not what true intelligence is about. In many real world problems, both NIs and AIs will have to learn for an uncertain future. Both must update their internal models to be useful for future tasks, such as naming fundamentally new objects and using these objects effectively in a new context or to achieve previously unencountered goals. This ability to learn for the future we call 'prospective learning'. We articulate four relevant factors that jointly define prospective learning. Continual learning enables intelligences to remember those aspects of the past which it believes will be most useful in the future. Prospective constraints (including biases and priors) facilitate the intelligence finding general solutions that will be applicable to future problems. Curiosity motivates taking actions that inform future decision making, including in previously unmet situations. Causal estimation enables learning the structure of relations that guide choosing actions for specific outcomes, even when the specific action-outcome contingencies have never been observed before. We argue that a paradigm shift from retrospective to prospective learning will enable the communities that study intelligence to unite and overcome existing bottlenecks to more effectively explain, augment, and engineer intelligences."No man ever steps in the same river twice. For it's not the same river and he's not the same man." -Heraclitus
SUMMARYWhile the biological analogue of prediction error has been well characterized in the midbrain dopaminergic system, the possibility of other neuromodulatory systems acting as global reinforcers is a topic of much debate. Reward timing, the phenomenon by which single unit responses in primary visual cortex (V1) reflect an operantly learned stimulus-reward interval, offers a tractable preparation to investigate reinforcement learning in vivo: theoretical work suggests that reward timing results from the interaction of stimulus-evoked recurrent network activity and a global reinforcement signal that indicates the time of received reward. We hypothesized that this signal is conveyed by cholinergic neurons arising from the basal forebrain (BF), a strong candidate system that projects globally to most cortical regions, has a known role in plasticity, and is involved in attention and the representation of salience. To test the necessity of such a signal in entraining reward timing in V1, rats were trained on an initial stimulus-reward contingency, received a neurotoxin in V1 that eliminated BF cholinergic terminals, and subsequently trained on a second contingency. We found that extracellular single unit recordings from V1 of lesioned animals, but not saline-infused controls, failed to show shifted neural reports of reward that matched the new contingency. Importantly, neurons of lesioned animals continued to display intervals associated with the initial contingency, arguing that cholinergic input is required to learn, but not to express, reward timing activity.DESCRIPTION We hypothesized that single unit responses in animals lacking BF cholinergic innervation in V1 would show perseverant reward timing activity under a novel cue-reward contingency. To test this, animals were chronically implanted with microelectrode arrays in V1 and trained to lick a fixed number of times to receive reward after right or left eye stimulation. Following the initial training period, either saline or 192-IgGsaporin -a neurotoxin selective for BF cholinergic neurons and their terminals -was infused into the immediate vicinity of the recording site. After a three day recovery period, animals were given a final session under the initial contingency and then trained under a new contingency. Since the reward timing activity of individual neurons is specific to one eye or the other, average firing rates following stimuli to each eye were evaluated with receiver operator characteristic (ROC) analysis. The neural report of reward was defined as the first moment when the area under the ROC curve fell back to chance with 95% confidence. Figure 1 shows example neurons following the contingency change from a control (A) and a lesioned animal (B). While the control unit reports a time that accords well with the new cue-reward interval, the lesioned unit continues to approximate the previous interval (average reward times before/after contingency change: 1.17s/1.79s for control; 1.49s/0.89s for lesioned). Comparing the distributions of reported r...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.