Azarov E., Petrovsky A. Training Personal Voice Model of a Speaker with Unified PhoneticSpace of Features Using Artificial Neural Network. Abstract. The paper investigates possibility of creating a personal voice model using transcribed speech samples of a specified speaker. The paper presents a practical way of building such speech model and some experimental results of applying the model to voice conversion. The model uses an artificial neural network organized as autoencoder that establishes correspondence between space of speech parameters and space of possible phonetic states, unified for any voice.
The paper presents an approach to the analysis of the modulation spectrum of a voice signal, in which the primary acoustic analysis is performed in bands of unequal width. Nonuniform analysis corresponds to the psychoacoustic laws of human perception of sound information. In the context of the analysis of the modulation spectrum, the considered approach can significantly reduce the resulting number of parameters, which greatly simplifies the task of detecting pathological changes in the voice signal based on the analysis of the parameters of the modulation spectrum. For frequency decomposition of a signal into bands of unequal width, two methods are considered: 1) DFT with channel combination and 2) the use of an nonuniform filter bank. The first method is characterized by a fixed time window for the analysis of all frequency components, while in the second method the time-frequency analysis plan is consistent with the critical frequency scale of the barks. For each method, a practical signal analysis circuit has been developed and described. The paper presents the experimental data on the application of the developed schemes for the analysis of the modulation spectrum to the problem of detecting pathology in a speech signal. The parameters of the modulation spectrum acted as information signs for a classifier built on the basis of linear discriminant analysis. Three different voice bases were used in the experiment (in two cases, the pathology was neurological ALS disease (amyotrophic lateral sclerosis), and in the third case, diseases of the larynx). The parameters of the modulation spectrum obtained in the DFT-based scheme with channel combining turned out to be more preferable for classification with a small number of features, however, greater accuracy (with an increase in the number of features) made it possible to obtain the parameters obtainedin the scheme based on an unequal filter bank. In all cases, the obtained classifiers were highly accurate (more than 97%). The obtained results show that the use of nonuniform time-frequency representation is preferable in the case when the analyzed signal is a sustained vowel phonation, since it provides higher accuracy of pathology detection using fewer modulation parameters
The paper investigates the problem of voice activity detection from a noisy sound signal. An extremely compact convolutional neural network is proposed. The model has only 385 trainable parameters. Proposed model doesn’t require a lot of computational resources that allows to use it as part of the “internet of things” concept for compact low power devices. At the same time the model provides state of the art results in voice activity detection in terms of detection accuracy. The properties of the model are achieved by using a special convolutional layer that considers the harmonic structure of vocal speech. This layer also eliminates redundancy of the model because it has invariance to changes of fundamental frequency. The model performance is evaluated in various noise conditions with different signal-to-noise ratios. The results show that the proposed model provides higher accuracy compared to voice activity detection model from the WebRTC framework by Google.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.