There are multiple preprocessing techniques used across the globe to enhance classification performance. Researchers in the past have incorporated various methods to improve feature extractions from image and text dataset. Extracting optimized features is still a challenge in the current scenarios, not only because size of the data is humongous, but it also involves noisy components. In this manuscript, we have proposed a method to compute the most optimized features from datasets by first transforming the features into the optimal ZCA features which are later fed into a 2-layered Stacked Autoencoder (SA). The novelty of the manuscript lies in the idea of deriving optimal features by performing optimal whitening (instead of plain ZCA) which is followed by feature extraction via stacked autoencoders. The in-depth features learned from SA are then fed into a feed-forward neural network as a final step towards classification. We have experimented rigorously on 10 non-computer vision, 2 hyperspectral remote sensing data set, and 1 computer vision datasets and found that classification accuracy increases to 2-4% on an average across many datasets. Additionally, we have validated the feature extraction method across plain vanilla neural network architecture and found the performance to increase by at most 5% on a few datasets.