Learning algorithms have been used both on feed-forward deterministic networks and on feed-back statistical networks to capture input-output relations and do pattern classification. These learning algorithms are examined for a class of problems characterized by noisy or statistical data, in which the networks learn the relation between input data and probability distributions of answers. is true and a probability Q-',o that the proposition is false. The object of the network learning is to capture the IJQ-+,"relationship, which is all the information that is known about the implication of the input instance a. This information can subsequently be used in a variety of modes, of which the simplest would be to choose an action based on maximum likelihood by using these probabilities. A computational probabilistic approach to a task is exemplified in hidden Markov approaches to speech-to-text conversion (12). The ensemble of speech utterances is described in terms of word models using a Markov description of the possible sound patterns associated with a given word. When a particular utterance is heard, the probability that each word model might generate that sound is evaluated. Sequences of such probabilities can then be used for word selection (13). The problem is intrinsically probabilistic because individual words often cannot be unambiguously understood in a context-free and speaker-independent fashion and because the analysis done may intrinsically ignore evidence necessary to distinguish accurately between similar sounds. A feed-forward network for doing such a task should generate probabilities of the occurrence of words as its outputs.Both the deterministic and the stochastic networks to be discussed will be given the same task-namely, to capture the probability of the truth of a set of propositions based on a given set of instances by using a learning algorithm. E. Baum and F. Wilczek (personal communication) have considered the utility of learning a probability distribution with an analog perceptron. Anderson and Abrahams (14) have discussed more elaborate uses of probabilities in deterministic networks.
Analog PerceptronConsider a multilayer, feed-forward analog perceptron. Although what is described in this section can be extended to systems having a large number of layers, we will for simplicity restrict consideration to a system having three layers of analog units and two layers of connections (Fig. la). The outputs of the first layer are forced by the input data. When input case a is present, the input data are the output of these units k and are given by Ik`t The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.