This paper presents a novel approach of combining classifiers outputs for audio emotion recognition. The proposed classifiers ensemble technique combines the confusion matrices of base classifiers. It is because some classifiers with overall lower performance have better accuracy for a specific class as compared to others with overall higher accuracy. In this approach, the best results obtained for different emotion classes from various classifiers are combined to create a combined confusion matrix. The performance of this approach was analyzed using three emotional speech databases in different languages, i.e., Berlin emotional speech database (EMO-DB), Italian emotional speech database (EMOVO-DB), and Surrey audio-visual expressed emotion database (SAVEE-DB). The openSMILE toolkit was used to extract a total of 8543 audio features. These features include pitch, energy, intensity, jitter, shimmer, formants, MFCC, MFB, LSP and spectral features. These features were normalized using min-max normalization technique, while correlation-based feature selection (CFS) with best-first search approach was used for feature reduction. The classification was performed using five different base classifiers, i.e., SVM, MLP, IBK, AdaBoost, and Random Forest. The experimental results showed better performance for the proposed technique as compared to other state-of-the-art methods. The classification accuracies obtained for seven emotion classes were 91.8%, 83.7%, and 80.5% for the EMO-DB, EMOVO-DB, and SAVEE-DB, respectively.