In this article, a novel pitch determination algorithm based on harmonic differences method (HDM) is proposed. Most of the algorithms today rely on autocorrelation, cepstrum, and lastly convolutional neural networks, and they have some limitations (small datasets, wideband or narrowband, musical sounds, temporal smoothing, etc.), accuracy, and speed problems. There are very rare works exploiting the spacing between the harmonics. HDM is designed for both wideband and exclusively narrowband (telephone) speech and tries to find the most repeating difference between the harmonics of speech signal. We use three vowel databases in our experiments, namely, Hillenbrand Vowel Database, Texas Vowel Database, and Vowels from the TIMIT corpus. We compare HDM with autocorrelation, cepstrum, YIN, YAAPT, CREPE, and FCN algorithms. Results show that harmonic differences are reliable and fast choice for robust pitch detection. Also, it is superior to others in most cases.
In this study, a novel filter bank design is proposed for speech emotion recognition to replace current state-of-the-art MFCC (Mel Filter Cepstral Coefficients) and Mel filter banks. These novel filter banks are considered to have a great impact and pave the way for great developments and improvements over speech emotion recognition applications. Many filter banks have been proposed to model speech recognition applications but these models either contain too many banks or need some cumbersome mathematical operations to compute. MFCC requires the calculation of DCT (Discrete Cosine Transform), and it is also too difficult to interpret the MFCC coefficients. Mel filters are easy to interpret but they contain too many filters. The novel filter banks are faster and easier to compute. Moreover, they can be interpreted better compared to the MFCC and Mel filters. We apply these filter banks with NVIDIA’s CNN and ResNet deep convolutional networks. We also implement feature selection, data augmentation, and various techniques to combat problems of imbalanced datasets to show the effectiveness of proposed filter banks.
In this study, a novel filter bank design is proposed for speech emotion recognition to replace current state-of-the-art MFCC (Mel Filter Cepstral Coefficients) and Mel filter banks. These novel filter banks are considered to have a great impact and pave the way for great developments and improvements over speech emotion recognition applications. Many filter banks have been proposed to model speech recognition applications but these models either contain too many banks or need some cumbersome mathematical operations to compute. MFCC requires the calculation of DCT (Discrete Cosine Transform), and it is also too difficult to interpret the MFCC coefficients. Mel filters are easy to interpret but they contain too many filters. The novel filter banks are faster and easier to compute. Moreover, they can be interpreted better compared to the MFCC and Mel filters. We apply these filter banks with NVIDIA’s CNN model and SVM-SMO classifier to compare them with MFCC and Mel filter banks. We also implement feature selection, data augmentation, and various techniques to combat problems of imbalanced datasets to show the effectiveness of proposed filter banks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.