2020
DOI: 10.3390/app10124091
|View full text |Cite
|
Sign up to set email alerts
|

Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector

Abstract: This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 45 publications
0
11
0
Order By: Relevance
“…For the batch processing of 𝕐l, the restricted range is defined as follows: sl=normalmin1iU,0.1em1jBsi,jl, el=normalmax1iU,0.1em1jBei,jl. The time‐restricted CTC‐prefix score can be computed in a small amount of the calculation. For a small but frequent calculation in a GPU device, memory transfer load between CPU and GPU can be a bottleneck [35]. Therefore, the calculation of the CTC‐prefix score is moved from the GPU device to a CPU device to reduce the GPU memory transfer load.…”
Section: Rapid and Efficient Transformer‐based Speech Recognitionmentioning
confidence: 99%
See 3 more Smart Citations
“…For the batch processing of 𝕐l, the restricted range is defined as follows: sl=normalmin1iU,0.1em1jBsi,jl, el=normalmax1iU,0.1em1jBei,jl. The time‐restricted CTC‐prefix score can be computed in a small amount of the calculation. For a small but frequent calculation in a GPU device, memory transfer load between CPU and GPU can be a bottleneck [35]. Therefore, the calculation of the CTC‐prefix score is moved from the GPU device to a CPU device to reduce the GPU memory transfer load.…”
Section: Rapid and Efficient Transformer‐based Speech Recognitionmentioning
confidence: 99%
“…These methods suffer when the signal‐to‐noise ratio is dynamically changes. In our previous work [35], the outputs of acoustic model (AM) were successfully employed to detect voice activity in DNN‐HMM‐based ASR systems.…”
Section: Rapid and Efficient Transformer‐based Speech Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Multithreading is one of the basic mechanisms for parallel computing, which is possible on one processor and on several processors [3,4]. It is used very often in utility applications (mentioned in the rest of the work), but also in scientific research [5]. Popular, commonly used open-source multithreaded applications are a very good source for research on undesirable phenomena in multithreaded applications.…”
Section: Introductionmentioning
confidence: 99%