Automated Classification of Vowel Category and Speaker Type in the High-Frequency Spectrum

Donai, Jeremy J.; Motiian, Saeid; Doretto, Gianfranco

doi:10.4081/audiores.2016.137

Cited by 5 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This may have been due to the fact that since a simple frequency representation was used for classification, the highfrequency region of male vowels contained richer harmonic information (and additional information from which a classification decision could be made) due to lower fundamental frequencies in male signals. The overall findings of the current study are in line with those reported by Donai et al (2016), who reported accurate classification of six vowels produced by a limited number of speakers (two male, two female, and two children) using information above approximately 3.5 kHz.…”

Section: Discussionsupporting

confidence: 92%

“…In recent years, researchers have been increasingly interested in whether high-frequency cues are robust enough for classification methods. Several approaches have been used, including linear discriminant analyses (Donai and Paschall, 2015) and various machine recognition (e.g., Deshpande and Holambe, 2011;Donai et al, 2016;Itakura 1994, 1995) and segregation techniques (Hu and Wang, 2004). The results of these studies are encouraging, demonstrating that speech information above 3-4 kHz is useful for machine recognition tasks.…”

Section: Introductionmentioning

confidence: 99%

“…Donai and Paschall (2015) reported vowel classification using linear discriminant analyses with spectral peak data above 3 kHz and found performance to be significantly above chance for the vast majority of vowels, mirroring the perceptual performance for the male, female, and child produced vowels. Donai et al (2016) investigated the use of high-frequency energy for classifying vowel category and talker type (male, female, or child) in an automated recognition framework. Classification results using mel-frequency cepstral coefficients (MFCCs) extracted from a limited number of high-pass filtered vowel segments, h(Vowel)d (hVd), as input to a support vector machine classifier showed over 90% accuracy when classifying vowel category (six vowels) from a combined set of vowel signals produced by two male, two female, and two child talkers.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Classification of indexical and segmental features of human speech using low- and high-frequency energy

Donai,

Paschall,

Haider

2023

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

The high-frequency region (above 4–5 kHz) of the speech spectrum has received substantial research attention over the previous decade, with a host of studies documenting the presence of important and useful information in this region. The purpose of the current experiment was to compare the presence of indexical and segmental information in the low- and high-frequency region of speech (below and above 4 kHz) and to determine the extent to which information from these regions can be used in a machine learning framework to correctly classify indexical and segmental aspects of the speech signal. Naturally produced vowel segments produced by ten male and ten female talkers were used as input to a temporal dictionary ensemble classification model in unfiltered, low-pass filtered (below 4 kHz), and high-pass filtered (above 4 kHz) conditions. Classification performance in the unfiltered and low-pass filtered conditions was approximately 90% or better for vowel categorization, talker sex, and individual talker identity tasks. Classification performance for high-pass filtered signals composed of energy above 4 kHz was well above chance for the same tasks. For several classification tasks (i.e., talker sex and talker identity), high-pass filtering had minimal effect on classification performance, suggesting the preservation of indexical information above 4 kHz.

show abstract

Section: Discussionsupporting

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Classification of indexical and segmental features of human speech using low- and high-frequency energy

Donai,

Paschall,

Haider

2023

The Journal of the Acoustical Society of America

Self Cite

View full text Add to dashboard Cite

show abstract

“…Computer vision deals with acquiring, processing, and understanding images in order to solve different tasks. Computer vision has a wide range of applications including video gaming [16], in the food industry [17], robotics [18,19,20], biomedical [21,22], and many more [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41].…”

Section: Introduction 11 Problem Definitionmentioning

confidence: 99%

Domain Adaptation and Privileged Information for Visual Recognition

Motiian¹

Self Cite

View full text Add to dashboard Cite

Domain Adaptation and Privileged Information for Visual Recognition Saeid Motiian The automatic identification of entities like objects, people or their actions in visual data, such as images or video, has significantly improved, and is now being deployed in access control, social media, online retail, autonomous vehicles, and several other applications. This visual recognition capability leverages supervised learning techniques, which require large amounts of labeled training data from the target distribution representative of the particular task at hand. However, collecting such training data might be expensive, require too much time, or even be impossible. In this work, we introduce several novel approaches aiming at compensating for the lack of target training data. Rather than leveraging prior knowledge for building task-specific models, typically easier to train, we focus on developing general visual recognition techniques, where the notion of prior knowledge is better identified by additional information, available during training. Depending on the nature of such information, the learning problem may turn into domain adaptation (DA), domain generalization (DG), leaning using privileged information (LUPI), or domain adaptation with privileged information (DAPI). When some target data samples are available and additional information in the form of labeled data from a different source is also available, the learning problem becomes domain adaptation. Unlike previous DA work, we introduce two novel approaches for the few-shot learning scenario, which require only very few labeled target samples, and even one can be very effective. The first method exploits a Siamese deep neural network architecture for learning an embedding where visual categories from the source and target distributions are semantically aligned and yet maximally separated. The second approach instead, extends adversarial learning to simultaneously maximize the confusion between source and target domains while achieving semantic alignment. In complete absence of target data, several cheaply available source datasets related to the target distribution can be leveraged as additional information for learning a task. This is the domain generalization setting. We introduce the first deep learning approach to address the DG problem, by extending a Siamese network architecture for learning a representation of visual categories that is invariant with respect to the sources, while imposing semantic alignment and class separation to maximize generalization performance on unseen target domains. There are situations in which target data for training might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on the information bottleneck that leverages the auxiliary view to improve the performance of visual classifiers. We do so by introducing a formulation that is general, in the sense that can ...

show abstract

The Perception and Use of High-Frequency Speech Energy: Clinical and Research Implications

Boyd-Pratt

Donai

2020

Perspect ASHA SIGs

View full text Add to dashboard Cite

Purpose High-frequency speech energy (above approximately 4–5 kHz) is garnering substantial research attention. This review surveys recent evidence surrounding the presence and use of perceptual information in the high-frequency region. Additionally, clinical and research applications relevant to speech, language, and hearing professionals are discussed. Method Five databases were used during the search (Medline, CINAHL, WorldCat, ERIC, and Google Scholar). Criteria for study inclusion included (a) peer review, (b) utilization of high-frequency energy (above approximately 4 kHz) during the experimental tasks, and (c) were published from 2014 to present. Fifty-seven articles were included for review, and after further inspection, 13 met the inclusion criteria and were retained. Results Thirteen peer-reviewed studies provided evidence to support the supposition that important and useable acoustic cues exist in the high-frequency portion of the speech spectrum. Conclusions Considering the evidence discussed in this document, it is apparent that the high-frequency region contains additional perceptual cues than currently assumed. Specifically, acoustic cues regarding segmental information (vowel and consonant identification), individual speaker identity, and speaker sex are available for use by human listeners and automated machine recognition systems. Additionally, the high-frequency speech region may reduce listening effort and improve speech recognition in noisy listening conditions, particularly when the speech and noise are spatially separated. Therefore, clinicians and researchers should be aware of this information, which can inform clinical practice when fitting amplification devices for various clinical populations and experimental research for speech and hearing scientists.

show abstract

Automated Classification of Vowel Category and Speaker Type in the High-Frequency Spectrum

Cited by 5 publications

References 14 publications

Classification of indexical and segmental features of human speech using low- and high-frequency energy

Classification of indexical and segmental features of human speech using low- and high-frequency energy

Domain Adaptation and Privileged Information for Visual Recognition

The Perception and Use of High-Frequency Speech Energy: Clinical and Research Implications

Contact Info

Product

Resources

About