Edge Container for Speech Recognition

Beňo, L.; Pribiš, Rudolf; Drahoš, Peter

doi:10.3390/electronics10192420

Cited by 4 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Be ňo et al [79] describe an implementation of Microsoft Cognitive Speech Service on the edge utilizing Microsoft Azure services. The solution is made up of two containers.…”

Section: Virtual Assistantsmentioning

confidence: 99%

“…Year Application Area Description EI Level [75] 2023 Federated learning Dynamic FL deployment and learning scheme 4 [76] 2021 Robotics Design methodology for ROS-based applications 4 [77] 2021 Healthcare Readmission prediction system for healthcare facilities 3 [78] 2022 Healthcare Electronic health records decomposing the patient's body into containers - [79] 2021 Virtual assistant Voice control for human-machine interaction 3 [81] 2022 Composite AI Ontology model for development of multi-agent AI systems - [82] 2018 Healthcare Human activity recognition 3 [83] 2021 Security Re-identification of people across multiple cameras 2 [84] 2022 Wildfire modelling A federation architecture to enable a composable infrastructure - [85] 2019 Computer vision Architecture for image processing 3…”

Section: Referencementioning

confidence: 99%

See 1 more Smart Citation

Containerization in Edge Intelligence: A Review

Urblik,

Kajati,

Papcun

et al. 2024

Electronics

View full text Add to dashboard Cite

The onset of cloud computing brought with it an adoption of containerization—a lightweight form of virtualization, which provides an easy way of developing and deploying solutions across multiple environments and platforms. This paper describes the current use of containers and complementary technologies in software development and the benefits it brings. Certain applications run into obstacles when deployed on the cloud due to the latency it introduces or the amount of data that needs to be processed. These issues are addressed by edge intelligence. This paper describes edge intelligence, the deployment of artificial intelligence close to the data source, the opportunities it brings, along with some examples of practical applications. We also discuss some of the challenges in the development and deployment of edge intelligence solutions and the possible benefits of applying containerization in edge intelligence.

show abstract

“…Be ňo et al [79] describe an implementation of Microsoft Cognitive Speech Service on the edge utilizing Microsoft Azure services. The solution is made up of two containers.…”

Section: Virtual Assistantsmentioning

confidence: 99%

Section: Referencementioning

confidence: 99%

Containerization in Edge Intelligence: A Review

Urblik,

Kajati,

Papcun

et al. 2024

Electronics

View full text Add to dashboard Cite

show abstract

“…A powerful computer with graphics processing units (GPUs) is mainly used to train the ASR model to achieve a word error rate (WER) of less than a few percent using hundreds of millions of weights. Most of the high-accuracy ASR models are fullcontext models, which wait to hear the complete utterance before generating output [3][4][5][6][7][8]. On the contrary, streaming ASR models try to generate output as fast as possible without waiting for the completion of utterance [9,10].…”

Section: Introductionmentioning

confidence: 99%

A Low-Latency Streaming On-Device Automatic Speech Recognition System Using a CNN Acoustic Model on FPGA and a Language Model on Smartphone

et al. 2022

View full text Add to dashboard Cite

This paper presents a low-latency streaming on-device automatic speech recognition system for inference. It consists of a hardware acoustic model implemented in a field-programmable gate array, coupled with a software language model running on a smartphone. The smartphone works as the master of the automatic speech recognition system and runs a three-gram language model on the acoustic model output to increase accuracy. The smartphone calculates and sends the Mel-spectrogram of an audio stream with 80 ms unit input from the built-in microphone of the smartphone to the field-programmable gate array every 80 ms. After ~35 ms, the field-programmable gate array sends the calculated word-piece probability to the smartphone, which runs the language model and generates the text output on the smartphone display. The worst-case latency from the audio-stream start time to the text output time was measured as 125.5 ms. The real-time factor is 0.57. The hardware acoustic model is derived from a time-depth-separable convolutional neural network model by reducing the number of weights from 115 M to 9.3 M to decrease the number of multiply-and-accumulate operations by two orders of magnitude. Additionally, the unit input length is reduced from 1000 ms to 80 ms, and to minimize the latency, no future data are used. The hardware acoustic model uses an instruction-based architecture that supports any sequence of convolutional neural network, residual network, layer normalization, and rectified linear unit operations. For the LibriSpeech test-clean dataset, the word error rate of the hardware acoustic model was 13.2% and for the language model, it was 9.1%. These numbers were degraded by 3.4% and 3.2% from the original convolutional neural network software model due to the reduced number of weights and the lowering of the floating-point precision from 32 to 16 bit. The automatic speech recognition system has been demonstrated successfully in real application scenarios.

show abstract