Many-Layered Learning

Utgoff, Paul E.; Stracuzzi, David John

doi:10.1162/08997660260293319

Cited by 93 publications

(33 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…See also (Holden, 1994;Wang et al, 1994;Amari and Murata, 1993;Wang et al, 1994;Guyon et al, 1992;Vapnik, 1992;Wolpert, 1994). Similar priors (or biases towards simplicity) are implicit in constructive and pruning algorithms, e.g., layer-by-layer sequential network construction (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971Ash, 1989;Moody, 1989;Gallant, 1988;Honavar and Uhr, 1988;Ring, 1991;Fahlman, 1991;Weng et al, 1992;Honavar and Uhr, 1993;Burgess, 1994;Fritzke, 1994;Parekh et al, 2000;Utgoff and Stracuzzi, 2002) (see also Sec. 5.3, 5.11), input pruning (Moody, 1992;Refenes et al, 1994), unit pruning (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971White, 1989;Mozer and Smolensky, 1989;Levin et al, 1994), weight pruning, e.g., optimal brain damage (LeCun et al, 1990b), and optimal brain surgeon (Hassibi and Stork, 1993).…”

Section: Better Bp Through Advanced Gradient Descent (Compare Sec 524)mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Better Bp Through Advanced Gradient Descent (Compare Sec 524)mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…There are many theoretical results from circuits complexity analysis which clearly indicate that circuits with a small number of layers can be extremely inefficient at representing functions that can otherwise be represented compactly with a deep circuit (Hastad, 1987;Allender, 1996). See (Utgoff & Stracuzzi, 2002;Bengio & Le Cun, 2007) for discussions of this question in the context of learning architectures.…”

Section: The Problem With Shallow Architecturesmentioning

confidence: 99%

“…We have already discussed the example of parity in the previous two sections. Other arguments can be brought to bear to strongly suggest that learning of more abstract functions is much more efficient when it is done sequentially, composing previously learned concepts in order to represent and learn more abstract concepts (Utgoff & Stracuzzi, 2002).…”

Section: Learning Abstractions One On Top Of the Othermentioning

confidence: 99%

“…Many intermediate levels of representation can exist in between these two extremes. Indeed, deep multi-layered systems, with each layer extracting a slightly higher level representation of its input patterns, have long been believed to be the key to building the ultimate intelligent learning systems (Utgoff & Stracuzzi, 2002). Unfortunately, we generally do not know a set of intermediate and high-level concepts which would appropriate to explain the data.…”

Section: Learning Abstractions One On Top Of the Othermentioning

confidence: 99%

See 1 more Smart Citation

On the challenge of learning complex functions

Bengio

2007

Progress in Brain Research

View full text Add to dashboard Cite

A common goal of computational neuroscience and of artificial intelligence research based on statistical learning algorithms is the discovery and understanding of computational principles that could explain what we consider adaptive intelligence, in animals as well as in machines. This chapter focuses on what is required for the learning of complex behaviors. We believe it involves the learning of highly varying functions, in a mathematical sense. We bring forward two types of arguments which convey the message that many currently popular machine learning approaches to learning flexible functions have fundamental limitations that render them inappropriate for learning highly varying functions. The first issue concerns the representation of such functions with what we call shallow model architectures. We discuss limitations of shallow architectures, such as so-called kernel machines, boosting algorithms, and one-hidden-layer artificial neural networks. The second issue is more focused and concerns kernel machines with a local kernel (the type used most often in practice), that act like a collection of template matching units. We present mathematical results on such computational architectures showing that they have a limitation similar to those already proved for older non-parametric methods, and connected to the so-called curse of dimensionality. Though it has long been believed that efficient learning in deep architectures is difficult, recently proposed computational principles for learning in deep architectures may offer a breakthrough.

show abstract

“…In another algorithm-specific technique, Many-layered learning (Utgoff & Stracuzzi, 2002) learns from an input stream how to choose the layers in a feed-forward neural network. Once a concept is learned, it is used as input to things that are still unlearned.…”

Section: Task Decompositionmentioning

confidence: 99%

Evolving Soccer Keepaway Players Through Task Decomposition

et al. 2005

View full text Add to dashboard Cite

Abstract. Complex control tasks can often be solved by decomposing them into hierarchies of manageable subtasks. Such decompositions require designers to decide how much human knowledge should be used to help learn the resulting components. On one hand, encoding human knowledge requires manual effort and may incorrectly constrain the learner's hypothesis space or guide it away from the best solutions. On the other hand, it may make learning easier and enable the learner to tackle more complex tasks. This article examines the impact of this trade-off in tasks of varying difficulty. A space laid out by two dimensions is explored: (1) how much human assistance is given and (2) how difficult the task is. In particular, the neuroevolution learning algorithm is enhanced with three different methods for learning the components that result from a task decomposition. The first method, coevolution, is mostly unassisted by human knowledge. The second method, layered learning, is highly assisted. The third method, concurrent layered learning, is a novel combination of the first two that attempts to exploit human knowledge while retaining some of coevolution's flexibility. Detailed empirical results are presented comparing and contrasting these three approaches on two versions of a complex task, namely robot soccer keepaway, that differ in difficulty of learning. These results confirm that, given a suitable task decomposition, neuroevolution can master difficult tasks. Furthermore, they demonstrate that the appropriate level of human assistance depends critically on the difficulty of the problem.

show abstract

Many-Layered Learning

Cited by 93 publications

References 24 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

On the challenge of learning complex functions

Evolving Soccer Keepaway Players Through Task Decomposition

Contact Info

Product

Resources

About