Training algorithms for deep learning have recently been proposed with notable success, beating the state-of-the-art in certain areas like audio, speech and language processing. The key role is played by learning multiple levels of abstractions in a deep architecture. However, searching the parameters space in a deep architecture is a difficult task. By exploiting the greedy layer-wise unsupervised training strategy of deep architecture, the network parameters are initialized near a good local minima. However, many existing deep learning algorithms require tuning a number of hyperparameters including learning factors and the number of hidden units in each layer. Apart from this, a predominant methodology in training deep learning models promotes the use of gradient-based algorithms which require heavy computational resources. Poor training algorithms and excessive user chosen parameters in a learning model makes it difficult to train a deep learner. In this dissertation, we break down the training of deep learning into basic building blocks of unsupervised approximation training followed by a supervised classification learning block. We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. This classifier is trained via Newton's algorithm. Performance gains are demonstrated at each step. We then develop a family of batch training algorithm for the multi layer perceptron that optimizes its hidden layer size and number of training epochs. At the end of each training epoch, median filtering removes any kind of noise in the validation error vs number of hidden units curve and the networks get temporarily pruned. Since, pruning is done at each epoch, we save the best network thereby optimizing the number of hidden units as well as the number of epochs simultaneously. Next, we combine pruning with a growing approach. Later, the input units are scaled to be the net function of the sigmoidal output units that are then feed into as input to the MLP. We then propose resulting improvements in each of the deep learning blocks thereby improving the overall performance of the deep architecture. We discuss the principles and formulation regarding learning algorithms for deep autoencoders. We investigate several problems in deep autoencoders networks including training issues, the theoretical, mathematical and experimental justification that the networks are linear, optimizing the number of hidden units in each layer and determining the depth of the deep learning model. A direct implication of the current work is the ability to construct fast deep learning models using desktop level computational resources. This, in our opinion, promotes our desig...