The challenge to refine the spontaneity and productivity of a machine and human coherence, speech emotion recognition has been an overriding area of research. The trustability and fulfillment of such emotion recognition are largely involved with the feature extraction and selection processes. An important role is played in exploring and distinguishing audio content during the feature extraction phase. Also, the features that have been extracted should be tough to a number of disturbances and reliable enough for an adequate classification system. This paper focuses on three main components of a Speech Emotion Recognition (SER) Process. The first one is the optimal feature extraction method for Punjabi SER system. The second one is the use of an appropriate feature selection method that desires to select effectual features from the ones extracted in the first step, and removes the redundant features, to improve the conduct of emotion recognition. The third one is the classification model that has been used further for emotion recognition. So, the scope of this paper is to explain the three main steps of Punjabi SER system, feature extraction, feature selection, and emotion recognition with classifier. The results have been calculated and compared for number of feature set combinations, with and without feature selection process. A total of 10 experiments are carried out and various performance metrics such as precision, recall, F1-score, accuracy, etc. are used to demonstrate the results.