The representation of good audio features is the first and foremost requirement for improving the identification performance of any system. Most of the representation learning approaches are based on connectionist systems to learn and extract latent features from the speech data. This research work presents a hybrid feature extraction approach to integrate Mel-Frequency Cepstral Coefficients (MFCC) features with Shifted Delta Cepstral (SDC) coefficients features, which are further stacked to Deep Belief Network (DBN), for yielding new feature representations of the speech signals. DBN is utilized for unsupervised feature learning on the extracted MFCC-SDC acoustic features. A 3-layer Back Propagation Neural Network (BPNN) classifier is initialized in terms of the learning outcomes of hidden layers of DBN for identifying language from the uttered speech. The efficiency of the proposed approach is evaluated by simulating several experimental algorithms on the user-defined database of isolated words in four languages, namely, Tamil, Malayalam, Hindi, and English, in the working platform of MATLAB. The obtained results for the proposed hybrid approach MFCC-SDC-DBN are promising. The proposed approach is also compared with the baseline feature extraction approach MFCC-SDC by utilizing traditional acoustic features and BPNN classifier. The accuracy obtained with our proposed approach is 98.1% whereas that of the baseline approach is 82%, thereby providing an overall improvement of 16.1%.