This thesis is about the neocognitron, a neural network that was proposed by Fukushima in 1979. Inspired by Hubel and Wiesel's serial model of processing in the visual cortex, the neocognitron was initially intended as a self-organizing model of vision, however, we are concerned with the supervised version of the network, put forward by Fukushima in 1983. Through "training with a teacher", Fukushima hoped to obtain a character recognition system that was tolerant of shifts and deformations in input images. Until now though, it has not been clear whether Fukushima's approach has resulted in a network that can rival the performance of other recognition systems.In the first three chapters of this thesis, the biological basis, operational principles SHOP serves as a method for probing the behaviour of the neocognitron and is used to investigate the effect of cell masks, skeletonization of input data and choice of training patterns on the network*s performance. Even though SHOP is the best selectivity adjustment algorithm to be described to date, the system's peak correct recognition rate (for isolated ZIP code digits from the CEDAR database) is around 75% (with 75% reliability) after SHOP training. It is clear that the neocognitron, as originally described by Fukushima, is unable to match the performance of today's most accurate digit recognition systems which typically achieve 90% correct recognition with near 100% reliability.After observing the neocognitron's failure to exploit the distinguishing features of different kinds of digits in its classification of images, Chapter 6 proposes modifications to enhance the networks ability in this regard. Using this new architecture, a correct clcissification rate of 84.62% (with 96.36% reliability) was obtained on CEDAR ZIP codes, a substantial improvement but still a level of performance that is somewhat less than state-of-the-art recognition rates. Chapter 6 concludes with a critical review of the hierarchical feature extraction paradigm.The final chapter summarizes the material presented in this thesis and draws the significant findings together in a series of conclusions. In addition to the investigation of the neocognitron, this thesis also contains a derivation of statistical bounds on the errors that arise in multilayer feedforward networks as a result of weight perturbation (Appendix E).