Objectives : With the COVID-19 outbreak around the globe and its potential effect on infected patients’ voice, this study set out to evaluate and compare the acoustic parameters of voice between healthy and infected people in an objective manner. Methods : Voice samples of 64 COVID-19 patients and 70 healthy Persian speakers who produced a sustained vowel /a/ were evaluated. Between-group comparisons of the data were performed using the two-way ANOVA and Wilcoxon's rank-sum test. Results : The results revealed significant differences in CPP, HNR, H1H2, F0SD, jitter, shimmer and MPT values between COVID-19 Patients and the healthy participants. There were also significant differences between the male and female participants in all the acoustic parameters, except jitter, shimmer and MPT. No interaction was observed between gender and health status in any of the acoustic parameters. Conclusion : The statistical analysis of the data revealed significant differences between the experimental and control groups in this study. Changes in the acoustic parameters of voice are caused by the insufficient airflow, and increased aperiodicity, irregularity, signal perturbation and level of noise, which are the consequences of pulmonary and laryngological involvements in patients with COVID-19
This study aimed to develop an artificial intelligence (AI)-based tool for screening COVID-19 patients based on the acoustic parameters of their voices. Twenty-five acoustic parameters were extracted from voice samples of 203 COVID-19 patients and 171 healthy individuals who produced a sustained vowel, i.e., /a/, as long as they could after a deep breath. The selected acoustic parameters were from different categories including fundamental frequency and its perturbation, harmonicity, vocal tract function, airflow sufficiency, and periodicity. After the feature extraction, different machine learning methods were tested. A leave-one-subject-out validation scheme was used to tune the hyper-parameters and record the test set results. Then the models were compared based on their accuracy, precision, recall, and F1-score. Based on accuracy (89.71%), recall (91.63%), and F1-score (90.62%), the best model was the feedforward neural network (FFNN). Its precision function (89.63%) was a bit lower than the logistic regression (90.17%). Based on these results and confusion matrices, the FFNN model was employed in the software. This screening tool could be practically used at home and public places to ensure the health of each individual's respiratory system. If there are any related abnormalities in the test taker's voice, the tool recommends that they seek a medical consultant.
Automatic speaker recognition applications have often been described as a ‘black box’. This study explores the benefit of tuning procedures (condition adaptation and reference normalisation) implemented in an i-vector PLDA framework ASR system, VOCALISE. These procedures enable users to open the black box to a certain degree. Subsets of two 100-speaker databases, one of Czech and the other of Persian male speakers, are used for the baseline condition and for the tuning procedures. The effect of tuning with cross-language material, as well as the effect of simulated voice disguise, achieved by raising the fundamental frequency by four semitones and resonance characteristics by 8%, are also examined. The results show superior recognition performance (EER) for Persian than Czech in the baseline condition, but an opposite result in the simulated disguise condition; possible reasons for this are discussed. Overall, the study suggests that both condition adaptation and reference normalisation are beneficial to recognition performance.
Many individuals around the world speak two or more than two languages. This phenomenon adds a fascinating dimension of variability to speech, both in perception and production. But do bilinguals change their voice when they switch from one language to the other? It is typically assumed that while some aspects of the speech signal vary for linguistic reasons, some indexical features remain unchanged across languages. Yet little is known about the influence of language on within- and between-speaker vocal variability. The present study investigated how acoustic parameters of voice quality are structured in two languages of a bilingual speaker and to what extent such features may vary between bilingual speakers. For this purpose, speech samples of 10 simultaneous Sorani Kurdish-Persian bilingual speakers were acoustically analyzed. Following a psychoacoustic model proposed by Kreiman (2014) and using a series of principal component analyses, we found that Sorani Kurdish-Persian bilingual speakers followed a similar acoustic pattern in their two different languages, suggesting that each speaker has a unique voice but uses the same voice parameters when switching from one language to the other.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.