2022
DOI: 10.3390/s22082938
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Lip-Reading Open Cloud-Based Speech Architecture

Abstract: Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in variou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 46 publications
0
5
0
Order By: Relevance
“…To evaluate the proposed model, three features were combined to compare the training step, error rate, and accuracy, and SNR evaluation was performed by synthesizing speeches in nine noise scenario environments that could be used in a real environment. Consequently, the proposed model exhibited the most stable convergence process and superior performance without significant changes in parameters and training time compared to other models, and it proved to be more robust to noise than previous studies [30,34]. In particular, by adding a new specific point, the log-Mel spectrogram information, the model exhibited a performance improvement of approximately 1.6%-1.2% over the performance of the model that used only two existing features (word embedding and lip movement).…”
Section: Discussionmentioning
confidence: 83%
See 2 more Smart Citations
“…To evaluate the proposed model, three features were combined to compare the training step, error rate, and accuracy, and SNR evaluation was performed by synthesizing speeches in nine noise scenario environments that could be used in a real environment. Consequently, the proposed model exhibited the most stable convergence process and superior performance without significant changes in parameters and training time compared to other models, and it proved to be more robust to noise than previous studies [30,34]. In particular, by adding a new specific point, the log-Mel spectrogram information, the model exhibited a performance improvement of approximately 1.6%-1.2% over the performance of the model that used only two existing features (word embedding and lip movement).…”
Section: Discussionmentioning
confidence: 83%
“…The proposed audio recognition module is illustrated in Figure 3. In previous studies [34,35], we used an open cloud-based speech recognition API using Microsoft's Azure Cognitive Services API [19], which had approximately 5%-10% better word recognition rates than Google Assistant and Amazon Transcribe. To mitigate the impact of performance changes over time, a new API that surpasses the current API is provided whenever available.…”
Section: Audio Modulementioning
confidence: 99%
See 1 more Smart Citation
“…In this study, an end-to-end visual speech recognition-based interaction system for speech interaction in a virtual aquarium environment was proposed. Recognized words were vectored by combining pretrained word embedding with Microsoft API, which showed the best performance and dense dispersion in previous studies [ 45 ]. Feature vectorization was performed on the image sequence input through video via the visual processing module, and word vectorization was combined to output the predicted word.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, the performance of the OCSR API is constantly updated by many companies that provide API services, and it depends on the study date and the type of training data. Therefore, we used the Microsoft Azure API, which has been proven to be superior in the results of previous studies [ 45 ]. In addition, the existing speech API can be replaced if another speech API with better performance is released and is not affected by performance changes over time.…”
Section: Architecture Of the Proposed Systemmentioning
confidence: 99%