Conventionally, autoencoders are unsupervised representation learning tools. In this work, we propose a novel discriminative autoencoder. Use of supervised discriminative learning ensures that the learned representation is robust to variations commonly encountered in image datasets. Using the basic discriminating autoencoder as a unit, we build a stacked architecture aimed at extracting relevant representation from the training data. The efficiency of our feature extraction algorithm ensures a high classification accuracy with even simple classification schemes like KNN (K-nearest neighbor). We demonstrate the superiority of our model for representation learning by conducting experiments on standard datasets for character/image recognition and subsequent comparison with existing supervised deep architectures like class sparse stacked autoencoder and discriminative deep belief network.The basic building blocks of these deep architectures are either the stochastic RBM's (Restricted Boltzmann Machines) [16] or the deterministic Autoencoders [17]. Given a training dataset, RBM tries to learn the network weights such that the similarity between the projection (of the training data) and the learned representation is maximized. Autoencoders (AE) on the other hand consists of two networks. The first one maps the input (training data) to the representation / feature space; the second network maps the representation space to the output (training data). Thus, an AE approximates an Identity operator; which may sound trivial, but by constraining the nodes or connections of the networks one can learn interesting representations of the data.RBM and AE are shallow architectures. Proponents of deep learning believe that better (compact / abstract) representation can be learnt by going deeper. However, learning the network weights for several layers is a difficult task. Usually, there is not enough data, the network overfits and loses its generalization ability thereby yielding subpar results at operational stage. In [17], authors presented a greedy mechanism to train the multilayer (stacked) architectures wherein each of the layer is individually trained to yield best possible representation which in turn acts as input to subsequent layer. Greedy approach learns only one network at a time, it has fewer parameters to learn, so even with limited training data, it yields better results during operation.