2014
DOI: 10.15622/sp.36.8
|View full text |Cite
|
Sign up to set email alerts
|

Training Personal Voice Model of a Speaker with Unified Phonetic Space of Features Using Artificial Neural Network

Abstract: Azarov E., Petrovsky A. Training Personal Voice Model of a Speaker with Unified PhoneticSpace of Features Using Artificial Neural Network. Abstract. The paper investigates possibility of creating a personal voice model using transcribed speech samples of a specified speaker. The paper presents a practical way of building such speech model and some experimental results of applying the model to voice conversion. The model uses an artificial neural network organized as autoencoder that establishes correspondence … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 4 publications
0
2
0
1
Order By: Relevance
“…The main reason for such a noticeable progress is the emergence of the possibility of using huge speech corpora (due to the multiple reduction in cost and acceleration of the computational process). In 2014, another new method for the automatic creation of a speech synthesizer from text has been proposed, that makes it possible to form voice and language models based on speech recordings and their transcriptions [5].…”
Section: Voice Deepfake Text-to-speech Synthesis Spoofing and Medial ...mentioning
confidence: 99%
See 1 more Smart Citation
“…The main reason for such a noticeable progress is the emergence of the possibility of using huge speech corpora (due to the multiple reduction in cost and acceleration of the computational process). In 2014, another new method for the automatic creation of a speech synthesizer from text has been proposed, that makes it possible to form voice and language models based on speech recordings and their transcriptions [5].…”
Section: Voice Deepfake Text-to-speech Synthesis Spoofing and Medial ...mentioning
confidence: 99%
“…The motives for such an act may be hooligan motives, revenge, blackmail, defamation, discrediting, etc. According to the current Russian legislation, such speech actions that constitute an offense may include, in particular, insult 3 , slander 4 , slander against a judge, a juror, a prosecutor, an investigator, a person conducting an inquiry, a bailiff 5 , insulting a government official 6 ; insulting a military member 7 , public calls for terrorist activities, public justification of terrorism or propaganda of terrorism 8 , public calls for extremist activities 9 , public calls for action aimed at violating the territorial integrity of the Russian Federation 10 , incitement to hatred or enmity, as well as humiliation of human dignity 11 , the threat of committing a terrorist act 12 , deliberately false reporting of an act of terrorism 13 , persuading, recruiting or otherwise involving a person in terrorist activities 14 , persuading, recruiting or otherwise involving a person in the activities of an extremist community, extremist organization 15 , compulsion to acts of a sexual nature 16 , the threat of murder or infliction of serious bodily injury 17 , threat or violent actions due to the administration of justice or the production of a preliminary investigation 18 , incitement to suicide 19 , inducement to commit suicide or assistance in committing suicide 20 , propaganda of narcotic drugs, psychotropic substances or their precursors, plants containing narcotic drugs or psychotropic substances or their precursors, and their parts containing narcotic drugs or psychotropic substances or their precursors, new potentially dangerous psychoactive substances 21 , etc. Thus, an intellectual and (or) material forgery (the creation of a forged audio document) is made, the right of a citizen to an individual voice, individual authorship of a message is violated.…”
Section: Classification Of Voice Deepfakes and Types Of Their Creatio...mentioning
confidence: 99%
“…определяет эталон «A1»: f(Sw (1) A1,A2 ), f(Sw (1) A1,B1 ), f(Sw (1) A1,B2 ), f(Sw (1) A1,E ), f(Sw (1) A1,M ), f(Sw (1) A1,N ), f(Sw (1) A1,Z ). Соответственно, активный выход i-го нейрона второго слоя определяет схожесть входного распознаваемого элемента к эталону «A1».…”
Section: информационно-управляющие системыunclassified