Statistical analysis of speech is an emerging area of machine learning. In this paper, we tackle the biometric challenge of Automatic Speaker Verification (ASV) of differentiating between samples generated by two distinct populations of utterances, those of an authentic human voice and those generated by a synthetic one. Solving such an issue through a statistical perspective foresees the definition of a decision rule function and a learning procedure to identify the optimal classifier. Classical state-ofthe-art countermeasures rely on strong assumptions such as stationarity or local-stationarity of speech that may be atypical to encounter in practice. We explore in this regard a robust non-linear and nonstationary signal decomposition method known as the Empirical Mode Decomposition combined with the Mel-Frequency Cepstral Coefficients in a novel fashion with a refined classifier technique known as multi-kernel Support Vector machine. We undertake significant real data case studies covering multiple ASV systems using different datasets, including the ASVSpoof 2019 challenge database. The obtained results overwhelmingly demonstrate the significance of our feature extraction and classifier approach versus existing conventional methods in reducing the threat of cyber-attack perpetrated by synthetic voice replication seeking unauthorised access.
ObjectivesTo characterize cervical vestibular evoked myogenic potentials (c-VEMPs) in bone conduction (BC) and air conduction (AC) in healthy children, to compare the responses to adults and to provide normative values according to age and sex.DesignObservational study in a large cohort of healthy children (n = 118) and adults (n = 41). The c-VEMPs were normalized with the individual EMG traces, the amplitude ratios were modeled with the Royston-Wright method.ResultsIn children, the amplitude ratios of AC and BC c-VEMP were correlated (r = 0.6, p < 0.001) and their medians were not significantly different (p = 0.05). The amplitude ratio was higher in men than in women for AC (p = 0.04) and BC (p = 0.03). Children had significantly higher amplitude ratios than adults for AC (p = 0.01) and BC (p < 0.001). Normative values for children are shown. Amplitude ratio is age-dependent for AC more than for BC. Confidence limits of interaural amplitude ratio asymmetries were less than 32%. Thresholds were not different between AC and BC (88 ± 5 and 86 ± 6 dB nHL, p = 0.99). Mean latencies for AC and BC were for P-wave 13.0 and 13.2 msec and for N-wave 19.3 and 19.4 msec.ConclusionThe present study provides age- and sex-specific normative data for c-VEMP for children (6 months to 15 years of age) for AC and BC stimulation. Up to the age of 15 years, c-VEMP responses can be obtained equally well with both stimulation modes. Thus, BC represents a valid alternative for vestibular otolith testing, especially in case of air conduction disorders.
Medical diagnostic methods that utilise modalities of patient symptoms such as speech are increasingly being used for initial diagnostic purposes and monitoring disease state progression. Speech disorders are particularly prevalent in neurological degenerative diseases such as Parkinson’s disease, the focus of the study undertaken in this work. We will demonstrate state-of-the-art statistical time-series methods that combine elements of statistical time series modelling and signal processing with modern machine learning methods based on Gaussian process models to develop methods to accurately detect a core symptom of speech disorder in individuals who have Parkinson’s disease. We will show that the proposed methods out-perform standard best practices of speech diagnostics in detecting ataxic speech disorders, and we will focus the study, particularly on a detailed analysis of a well regarded Parkinson’s data speech study publicly available making all our results reproducible. The methodology developed is based on a specialised technique not widely adopted in medical statistics that found great success in other domains such as signal processing, seismology, speech analysis and ecology. In this work, we will present this method from a statistical perspective and generalise it to a stochastic model, which will be used to design a test for speech disorders when applied to speech time series signals. As such, this work is making contributions both of a practical and statistical methodological nature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.