2002
DOI: 10.1162/08997660260293319
|View full text |Cite
|
Sign up to set email alerts
|

Many-Layered Learning

Abstract: We explore incremental assimilation of new knowledge by sequential learning. Of particular interest is how a network of many knowledge layers can be constructed in an on-line manner, such that the learned units represent building blocks of knowledge that serve to compress the overall representation and facilitate transfer. We motivate the need for many layers of knowledge, and we advocate sequential learning as an avenue for promoting the construction of layered knowledge structures. Finally, our novel STL alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2005
2005
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 93 publications
(33 citation statements)
references
References 24 publications
0
33
0
Order By: Relevance
“…See also (Holden, 1994;Wang et al, 1994;Amari and Murata, 1993;Wang et al, 1994;Guyon et al, 1992;Vapnik, 1992;Wolpert, 1994). Similar priors (or biases towards simplicity) are implicit in constructive and pruning algorithms, e.g., layer-by-layer sequential network construction (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971Ash, 1989;Moody, 1989;Gallant, 1988;Honavar and Uhr, 1988;Ring, 1991;Fahlman, 1991;Weng et al, 1992;Honavar and Uhr, 1993;Burgess, 1994;Fritzke, 1994;Parekh et al, 2000;Utgoff and Stracuzzi, 2002) (see also Sec. 5.3, 5.11), input pruning (Moody, 1992;Refenes et al, 1994), unit pruning (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971White, 1989;Mozer and Smolensky, 1989;Levin et al, 1994), weight pruning, e.g., optimal brain damage (LeCun et al, 1990b), and optimal brain surgeon (Hassibi and Stork, 1993).…”
Section: Better Bp Through Advanced Gradient Descent (Compare Sec 524)mentioning
confidence: 99%
“…See also (Holden, 1994;Wang et al, 1994;Amari and Murata, 1993;Wang et al, 1994;Guyon et al, 1992;Vapnik, 1992;Wolpert, 1994). Similar priors (or biases towards simplicity) are implicit in constructive and pruning algorithms, e.g., layer-by-layer sequential network construction (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971Ash, 1989;Moody, 1989;Gallant, 1988;Honavar and Uhr, 1988;Ring, 1991;Fahlman, 1991;Weng et al, 1992;Honavar and Uhr, 1993;Burgess, 1994;Fritzke, 1994;Parekh et al, 2000;Utgoff and Stracuzzi, 2002) (see also Sec. 5.3, 5.11), input pruning (Moody, 1992;Refenes et al, 1994), unit pruning (e.g., Ivakhnenko, 1968Ivakhnenko, , 1971White, 1989;Mozer and Smolensky, 1989;Levin et al, 1994), weight pruning, e.g., optimal brain damage (LeCun et al, 1990b), and optimal brain surgeon (Hassibi and Stork, 1993).…”
Section: Better Bp Through Advanced Gradient Descent (Compare Sec 524)mentioning
confidence: 99%
“…There are many theoretical results from circuits complexity analysis which clearly indicate that circuits with a small number of layers can be extremely inefficient at representing functions that can otherwise be represented compactly with a deep circuit (Hastad, 1987;Allender, 1996). See (Utgoff & Stracuzzi, 2002;Bengio & Le Cun, 2007) for discussions of this question in the context of learning architectures.…”
Section: The Problem With Shallow Architecturesmentioning
confidence: 99%
“…We have already discussed the example of parity in the previous two sections. Other arguments can be brought to bear to strongly suggest that learning of more abstract functions is much more efficient when it is done sequentially, composing previously learned concepts in order to represent and learn more abstract concepts (Utgoff & Stracuzzi, 2002).…”
Section: Learning Abstractions One On Top Of the Othermentioning
confidence: 99%
See 1 more Smart Citation
“…In another algorithm-specific technique, Many-layered learning (Utgoff & Stracuzzi, 2002) learns from an input stream how to choose the layers in a feed-forward neural network. Once a concept is learned, it is used as input to things that are still unlearned.…”
Section: Task Decompositionmentioning
confidence: 99%