Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2457
|View full text |Cite
|
Sign up to set email alerts
|

BUT OpenSAT 2017 Speech Recognition System

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…We can see that the overall behavior of individual systems stays similar and the input 0-gram augmentation brings clearly the largest gain. During the work on this paper, we have applied the 0-gram input augmentation to speech recognition in the OpenSAT challenge [15]. This task is considerable easier, with WER at around 10 %.…”
Section: Chime-6 Rescoring Resultsmentioning
confidence: 99%
“…We can see that the overall behavior of individual systems stays similar and the input 0-gram augmentation brings clearly the largest gain. During the work on this paper, we have applied the 0-gram input augmentation to speech recognition in the OpenSAT challenge [15]. This task is considerable easier, with WER at around 10 %.…”
Section: Chime-6 Rescoring Resultsmentioning
confidence: 99%
“…Graves [17] first tried to use BLSTM for acoustic modeling of speech recognition and achieved the best recognition performance at that time on the TIMIT corpus. Subsequently, many researchers have studied the lowresource speech acoustic modeling of BLSTM [18]- [21]. Although the BLSTM has achieved good results, the structure has a large number of parameters and some complex training mechanisms.…”
Section: ) Recurrent Neural Networkmentioning
confidence: 99%
“…On contrary to the pilot OpenSAT evaluations conducted in 2017 [5], where the target data was real fireman-dispatcher communication from the Charleston Sofa Super Store Fire in 2007, for Open-SAT 2019, NIST prepared simulated public safety communications collection ("SAFE-T") specifically designed for speech analytic systems. The data is intended to simulate a combination of characteristics found in public safety communications: background noises, channel transmission noises and speaking characteristics such as stress or sense of urgency.…”
Section: Public Safety Communicationsmentioning
confidence: 99%
“…Various classical hybrid DNN-HMM (Hidden Markov Model) speech recognizers were trained in Kaldi toolkit [6]. We decided to select Factorized Time Delay NN (TDNN-F) based architectures [7] as we were consistently obtaining superior performance to recurrent NN architectures used in last OpenSAT evaluations [5]:…”
Section: Hybrid Acoustics Modelsmentioning
confidence: 99%
See 1 more Smart Citation