И. С. Азаров scite author profile

И. С. Азаров

4Publications

2Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

Affiliations

Belarusian State University of Informatics and Radioelectronics

Publications

Order By: Most citations

Training Personal Voice Model of a Speaker with Unified Phonetic Space of Features Using Artificial Neural Network

Азаров¹,

Петровский²

2014

Тр. СПИИРАН

View full text Add to dashboard Cite

Azarov E., Petrovsky A. Training Personal Voice Model of a Speaker with Unified PhoneticSpace of Features Using Artificial Neural Network. Abstract. The paper investigates possibility of creating a personal voice model using transcribed speech samples of a specified speaker. The paper presents a practical way of building such speech model and some experimental results of applying the model to voice conversion. The model uses an artificial neural network organized as autoencoder that establishes correspondence between space of speech parameters and space of possible phonetic states, unified for any voice.

show abstract

Pitch modification of speech signal using harmonic model with time-varying parameters

Азаров¹,

Vashkevich²,

Likhachev³

et al. 2014

SPIIRAS Proc.

View full text Add to dashboard Cite

Voice Pathology Detection based on Analysis of Modulation Spectrum in Critical Bands

Vashkevich¹,

Азаров²

2020

Тр. СПИИРАН

View full text Add to dashboard Cite

The paper presents an approach to the analysis of the modulation spectrum of a voice signal, in which the primary acoustic analysis is performed in bands of unequal width. Nonuniform analysis corresponds to the psychoacoustic laws of human perception of sound information. In the context of the analysis of the modulation spectrum, the considered approach can significantly reduce the resulting number of parameters, which greatly simplifies the task of detecting pathological changes in the voice signal based on the analysis of the parameters of the modulation spectrum. For frequency decomposition of a signal into bands of unequal width, two methods are considered: 1) DFT with channel combination and 2) the use of an nonuniform filter bank. The first method is characterized by a fixed time window for the analysis of all frequency components, while in the second method the time-frequency analysis plan is consistent with the critical frequency scale of the barks. For each method, a practical signal analysis circuit has been developed and described. The paper presents the experimental data on the application of the developed schemes for the analysis of the modulation spectrum to the problem of detecting pathology in a speech signal. The parameters of the modulation spectrum acted as information signs for a classifier built on the basis of linear discriminant analysis. Three different voice bases were used in the experiment (in two cases, the pathology was neurological ALS disease (amyotrophic lateral sclerosis), and in the third case, diseases of the larynx). The parameters of the modulation spectrum obtained in the DFT-based scheme with channel combining turned out to be more preferable for classification with a small number of features, however, greater accuracy (with an increase in the number of features) made it possible to obtain the parameters obtainedin the scheme based on an unequal filter bank. In all cases, the obtained classifiers were highly accurate (more than 97%). The obtained results show that the use of nonuniform time-frequency representation is preferable in the case when the analyzed signal is a sustained vowel phonation, since it provides higher accuracy of pathology detection using fewer modulation parameters

show abstract

Voice activity detection in noisy conditions using tiny convolutional neural network

Вашкевич

Азаров

2020

Informatika (Minsk)

View full text Add to dashboard Cite

The paper investigates the problem of voice activity detection from a noisy sound signal. An extremely compact convolutional neural network is proposed. The model has only 385 trainable parameters. Proposed model doesn’t require a lot of computational resources that allows to use it as part of the “internet of things” concept for compact low power devices. At the same time the model provides state of the art results in voice activity detection in terms of detection accuracy. The properties of the model are achieved by using a special convolutional layer that considers the harmonic structure of vocal speech. This layer also eliminates redundancy of the model because it has invariance to changes of fundamental frequency. The model performance is evaluated in various noise conditions with different signal-to-noise ratios. The results show that the proposed model provides higher accuracy compared to voice activity detection model from the WebRTC framework by Google.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.