Biological underpinnings for lifelong learning machines

Kudithipudi, Dhireesha; Aguilar-Simon, Mario; Babb, Jonathan; Bazhenov, Maxim; Blackiston, Douglas; Bongard, Josh; Brna, Andrew; Raja, Suraj Chakravarthi; Cheney, Nick; Clune, Jeff; Daram, Anurag; Fusi, Stefano; Helfer, Peter; Kay, Leslie M.; Ketz, Nicholas; Kira, Zsolt; Kolouri, Soheil; Krichmar, Jeffrey L.; Kriegman, Sam; Levin, Michael; Madireddy, Sandeep; Manicka, Santosh; Marjaninejad, Ali; McNaughton, Bruce L.; Miikkulainen, Risto; Navratilova, Zaneta; Pandit, Tej; Parker, Alice C.; Pilly, Praveen K.; Risi, Sebastian; Sejnowski, Terrence J.; Soltoggio, Andrea; Soures, Nicholas; Tolias, Andreas S.; Urbina-Meléndez, Darío; Valero-Cuevas, Francisco J.; Ven, Gido M. van de; Vogelstein, Joshua T.; Wang, Felix; Weiss, Ron; Yanguas‐Gil, Angel; Zou, Xinyun; Siegelmann, Hava T.

doi:10.1038/s42256-022-00452-0

Cited by 119 publications

(60 citation statements)

References 219 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Agents are rewarded for each item obtained in the sequence, with lower rewards for items that have to be collected in bulk and higher rewards for items near the end of the sequence. Agents are optimized with the phasic policy gradient 64 A major problem when fine-tuning with RL is catastrophic forgetting 65,66 because previously learned skills can be lost before their value is realized. For instance, while our VPT foundation model never exhibits the entire sequence of behaviors required to smelt iron zero-shot, it did train on examples of players smelting with furnaces.…”

Section: Fine-tuning With Reinforcement Learningmentioning

confidence: 99%

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Baker¹,

Akkaya²,

Zhokhov³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. [1][2][3][4][5][6] However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -here, online videos of people playing Minecraft -from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zeroshot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit humanlevel performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish. * This was a large effort by a dedicated team. Each author made huge contributions on many fronts over long time periods. All members were full time on the project for over six months. BB, IA, PZ, and JC were on the original VPT project team and were thus involved for even longer (over a year). Aside from those original team members, author order is random. It was also randomized between IA and PZ.

show abstract

Section: Fine-tuning With Reinforcement Learningmentioning

confidence: 99%

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Baker¹,

Akkaya²,

Zhokhov³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…One long-known failing of artificial neural networks is “catastrophic forgetting” of previously recognized entities after synaptic weights are modified to represent more recently experienced entities ( French, 1999 ). Biological organisms achieve “lifelong learning” by a variety of mechanisms, some of which have been applied in AI ( Kudithipudi et al, 2022 ). Learning principles of causation instead of memorizing experiences may avoid the problem in the first place.…”

Section: Discovering Causalitymentioning

confidence: 99%

Developing Intelligent Robots that Grasp Affordance

Loeb

2022

Front. Robot. AI

View full text Add to dashboard Cite

Humans and robots operating in unstructured environments both need to classify objects through haptic exploration and use them in various tasks, but currently they differ greatly in their strategies for acquiring such capabilities. This review explores nascent technologies that promise more convergence. A novel form of artificial intelligence classifies objects according to sensory percepts during active exploration and decides on efficient sequences of exploratory actions to identify objects. Representing objects according to the collective experience of manipulating them provides a substrate for discovering causality and affordances. Such concepts that generalize beyond explicit training experiences are an important aspect of human intelligence that has eluded robots. For robots to acquire such knowledge, they will need an extended period of active exploration and manipulation similar to that employed by infants. The efficacy, efficiency and safety of such behaviors depends on achieving smooth transitions between movements that change quickly from exploratory to executive to reflexive. Animals achieve such smoothness by using a hierarchical control scheme that is fundamentally different from those of conventional robotics. The lowest level of that hierarchy, the spinal cord, starts to self-organize during spontaneous movements in the fetus. This allows its connectivity to reflect the mechanics of the musculoskeletal plant, a bio-inspired process that could be used to adapt spinal-like middleware for robots. Implementation of these extended and essential stages of fetal and infant development is impractical, however, for mechatronic hardware that does not heal and replace itself like biological tissues. Instead such development can now be accomplished in silico and then cloned into physical robots, a strategy that could transcend human performance.

show abstract

“…symbolic AI). Noise tolerance in this sense is for example also an issue of lifelong learning machines (Kudithipudi et al, 2022).…”

Section: Contributionsmentioning

confidence: 99%

Testing robustness of predictions of trained classifiers against naturally occurring perturbations

Scher¹,

Trügler²

2022

Preprint

View full text Add to dashboard Cite

Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.

show abstract

Biological underpinnings for lifelong learning machines

Cited by 119 publications

References 219 publications

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Developing Intelligent Robots that Grasp Affordance

Testing robustness of predictions of trained classifiers against naturally occurring perturbations

Contact Info

Product

Resources

About