Extraction and Utilization of Excitation Information of Speech: A Review

Kadiri, Sudarsana Reddy; Alku, Paavo; Yegnanarayana, B.

doi:10.1109/jproc.2021.3126493

Cited by 9 publications

(10 citation statements)

References 268 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The assumed signal model for this measurement is a variant of the sinusoidal model (McAulay and Quatieri, 1986). We need to explore pitch extractors assuming the other signal models, especially excitation-based models (Kadiri et al, 2021). We also need to explore relations to deep-learning-based pitch extractors.…”

Section: Discussionmentioning

confidence: 99%

Measuring pitch extractors' response to frequency-modulated multi-component signals

Kawahara¹,

Yatabe²,

Sakakibara³

et al. 2022

Preprint

View full text Add to dashboard Cite

This article focuses on the research tool for investigating the fundamental frequencies of voiced sounds. We introduce an objective and informative measurement method of pitch extractors' response to frequency-modulated tones. The method uses a new test signal for acoustic system analysis. The test signal enables simultaneous measurement of the extractors' responses. They are the modulation frequency response and the total distortion, including intermodulation distortions. We applied this method to various pitch extractors and placed them on several performance maps. We used the proposed method to fine-tune one of the extractors to make it the best fit tool for scientific research of voice fundamental frequencies.[https://doi.org(DOI number)][XYZ] Pages: 1-11

show abstract

Section: Discussionmentioning

confidence: 99%

Measuring pitch extractors' response to frequency-modulated multi-component signals

Kawahara¹,

Yatabe²,

Sakakibara³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, it is necessary to understand the acoustic cues that describe the detailed speaker characteristics, including different voice qualities and emotions. Then, utilize this information to identify the speaker [12]. Hence, it is required to capture the features describing excitation and the vocal filter in speech production.…”

Section: Rationale Behind Using Excitation Features In Speaker Identi...mentioning

confidence: 99%

“…Thus, excitation features provide supportive information to the frequently used vocal tract features of various speakers. The different methods are well established to describe the vocal tract filter, but the researchers showed less interest in excitation features [12]. The study of [13] demonstrated the methods to capture the excitation features effectively and mentioned the future scope to combine excitation and vocal tract features.…”

Section: Rationale Behind Using Excitation Features In Speaker Identi...mentioning

confidence: 99%

“…They picked up the log-MelSpectrum of the speech signal and obtained a notable improvement in accuracy. In [12], the authors stated that it is required to analyze the generation of acoustical cues in the process of speech production. Extracting the features of both the excitation and vocal tract system is required.…”

Section: Literature Reviewmentioning

confidence: 99%

“…It includes the emotion and health state of the speaker. The speech production system consists of two functional parts [12]. One is excitation produced at the larynx and another is filtering, which is excited by the excitation e(n).…”

Section: Proposed Excitation Featuresmentioning

confidence: 99%

See 2 more Smart Citations

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

Pentapati

Sridevi

2023

Int. j. electr. comput. eng. syst. (Online)

View full text Add to dashboard Cite

There are various challenges in identifying the speakers accurately. The Extraction of discriminative features is a vital task for accurate identification in the speaker identification task. Nowadays, speaker identification is widely investigated using deep learning. The complex and noisy speech data affects the performance of Mel Frequency Cepstral Coefficients (MFCC); hence, MFCC fails to represent the speaker characteristics accurately. In this proposed work, a novel text-independent speaker identification system is developed to enhance the performance by fusion of Log-MelSpectrum and excitation features. The excitation information is obtained due to the vibration of vocal folds, and it is represented using Linear Prediction (LP) residual. The various types of features extracted from the excitation are residual phase, sharpness, Energy of Excitation (EoE), and Strength of Excitation (SoE). The extracted features were processed with the dilated convolution neural network (dilated CNN) to fulfill the identification task. The extensive evaluation showed that the fusion of excitation features gives better results than the existing methods. The accuracy reaches 94.12% for 11 complex classes and 91.34% for 80 speakers, and Equal Error Rate (EER) is reduced to 1.16% for the proposed model. The proposed model is tested with the Librispeech corpus using Matlab 2021b tool, outperforming the existing baseline models. The proposed model achieves an accuracy improvement of 1.34% compared to the baseline system.

show abstract

Analysis of Instantaneous Frequency Components of Speech Signals for Epoch Extraction

Kadiri

Alku²,

Yegnanarayana³

2023

Computer Speech & Language

View full text Add to dashboard Cite

Extraction and Utilization of Excitation Information of Speech: A Review

Cited by 9 publications

References 268 publications

Measuring pitch extractors' response to frequency-modulated multi-component signals

Measuring pitch extractors' response to frequency-modulated multi-component signals

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

Analysis of Instantaneous Frequency Components of Speech Signals for Epoch Extraction

Contact Info

Product

Resources

About