A Framework for Recording Audio-Visual Speech Corpora with a Microphone and a High-Speed Camera

Karpov, Alexey; Kipyatkova, Irina; Železný, Miloš

doi:10.1007/978-3-319-11581-8_6

Cited by 7 publications

(3 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, robust and reliable automatic Russian speech recognition systems, practically, do not exist. The development of Russian speech technologies is heavily influenced by the nature of the language, such as absence of strict grammatical constructions in sentences, huge amount of word formation rules, large number of exceptions, and the variability of Russian speech in the presence of dialects and accents [26].…”

Section: Datamentioning

confidence: 99%

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Ivanko

Ryumin

Karpov

2021

Electromechanics and Robotics

Self Cite

View full text Add to dashboard Cite

In recent years, audio speech has become more and more popular and often used in modern human-robot interfaces. Such natural form of communication is highly appreciated by users. There is no doubt that in the nearest future, alongside with the technology development, we will encounter the development of such "native" human-robot interfaces. In this paper, we propose the architecture and develop the software-hardware complex designed for automatic speech recognition with a dictionary of small and medium size and to be used in robots. A distinctive feature of the developed software-hardware complex is the presence of an audiovisual speech synchronization module, which allows both (1) to detect a speech signal in audio data and (2) to take into account the natural asynchrony between acoustic and visual speech. Based on this, it is possible (3) to synchronize the speech sections of audio and video streams in time. Another distinctive feature is the presence of a modality combining module, which allows (1) to combine informative data from audio and video signals and (2) to adjust the weights of each modality depending on the SNR level, which allows achieving optimal recognition accuracy even in acoustically noisy conditions.

show abstract

Section: Datamentioning

confidence: 99%

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Ivanko

Ryumin

Karpov

2021

Electromechanics and Robotics

Self Cite

View full text Add to dashboard Cite

show abstract

“…В работах [45][46][47][48] более подробно описаны подходы к извлечению визуальных признаков, исполь-зуемых в задачах определения контура губ говорящего, структурно-виземного анализа русской речи и др. В публикациях [49,50] также рассматриваются методы извлечения визуальных признаков в контексте задачи распознавания речи по губам.…”

Section: рис 1 общая структура аудиовизуальной системы распознаваниunclassified

Analysis of multimodal fusion techniques for audio-visual speech recognition

Ivanko¹,

Kipyatkova²,

Ronzhin³

et al. 2016

Naučno-teh. vestn. inf. tehnol. meh. opt.

View full text Add to dashboard Cite

“…Количество используемых виземных классов зависит от языка, и для русского обычно использова-лось от 10 до 14 классов [7][8][9]. В наших экспериментах мы использовали от 2 (разделение на гласные и согласные) до 48 виземных классов (по количеству фонем), с шагом 2.…”

unclassified

Accuracy increase for automatic visual Russian speech recognition: viseme classes optimization

Викторович¹,

Валерьевич²,

Анатольевич³

2018

Naučno-teh. vestn. inf. tehnol. meh. opt.

View full text Add to dashboard Cite

Научно-технический вестник информационных технологий, механики и оптики, Mechanics and Optics, 2018, vol. 18, no. 2, pp. 346-349 (in Russian). doi: 10.17586/2226-1494-2018 Abstract Nowadays there are a lot of continuous studies on the correct viseme classes to be used for the most effective automatic lipreading. The paper proposes a structured approach for the development of speaker-dependent classes of visemes. This method

show abstract

A Framework for Recording Audio-Visual Speech Corpora with a Microphone and a High-Speed Camera

Cited by 7 publications

References 10 publications

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Analysis of multimodal fusion techniques for audio-visual speech recognition

Accuracy increase for automatic visual Russian speech recognition: viseme classes optimization

Contact Info

Product

Resources

About