This chapter includes contributions to the theory of on-line training of artificial neural networks (ANN), considering the multilayer perceptrons (MLP) topology. By on-line training, we mean that the learning process is conducted while the signal processing is being executed by the system, i.e., the neural network continuously adjusts its free parameters from the variations in the incident signal in real time (Haykin, 1999). An artificial neural network is a massively parallel distributed processor made up of simple processing units, which have a natural tendency to store experimental knowledge and make it available for use (Haykin, 1999). These units (also called neurons) are non-linear adaptable devices, although very simple in terms of computing power and memory. However, when linked, they have enormous potential for nonlinear mappings. The learning algorithm is the procedure used to do the learning process, whose function is to modify the synaptic weights of the network in an orderly manner to achieve a desired goal of the project (Haykin, 1999). Although initially used only in problems of pattern recognition and signal processing and image, today, the ANN are used to solve various problems in several areas of human knowledge. An important feature of ANN is its ability to generalize, i.e., the ability of the network to provide answers in relation to standards unknown or not presented during the training phase. Among the factors that influence the generalization ability of ANN, we cite the network topology and the type of algorithm used to train the network. The network topology refers to the number of inputs, outputs, number of layers, number of neurons per layer and activation function. From the work of Cybenko (1989), networks with the MLP topology had widespread use, because they possessed the characteristic of universal approximator of continuous functions. Basically, an MLP network is subdivided into the following layers: input layer, intermediate or hidden layer(s) and output layer. The operation of an MLP network is synchronous, i.e., given an input vector, it is propagated to the output by multiplying by the weights of each layer, applying the activation function (the model of each neuron of the network includes a non-linear activation function, being the non-linearity differentiable at any point) and propagating this value to the next layer until the output layer is reached. Issues such as flexibility of the system to avoid biased solutions (under tting)and,conversely , limiting the complexity of network topology, thus avoiding the variability of solutions