ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682256
|View full text |Cite
|
Sign up to set email alerts
|

CRF-based Single-stage Acoustic Modeling with CTC Topology

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
38
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 27 publications
(39 citation statements)
references
References 20 publications
1
38
0
Order By: Relevance
“…Hu et al [17] used several improved Darts based methods to find efficient context offset and bottleneck dimension for acoustic model. Zheng et al [18] proposed straight-through gradient search beyond SNAS [24] and ProxylessNAS [25] and trained acoustic model with CRF-CTC [26], but the search space was still limited to receptive field and dimension of TDNN-F only. Kim et al [19] proposed a relatively complex search space: evolved Transformer, split the encoder and decoder of Transformer into left and right branch and iterated the population with progressive dynamic hurdles (PDH) [4].…”
Section: Nas In Speech Fieldmentioning
confidence: 99%
“…Hu et al [17] used several improved Darts based methods to find efficient context offset and bottleneck dimension for acoustic model. Zheng et al [18] proposed straight-through gradient search beyond SNAS [24] and ProxylessNAS [25] and trained acoustic model with CRF-CTC [26], but the search space was still limited to receptive field and dimension of TDNN-F only. Kim et al [19] proposed a relatively complex search space: evolved Transformer, split the encoder and decoder of Transformer into left and right branch and iterated the population with progressive dynamic hurdles (PDH) [4].…”
Section: Nas In Speech Fieldmentioning
confidence: 99%
“…There are recent attempts to use end-toend ASR models such as CTC with monophones [8,25] or end-toend LF-MMI with biphones [28,27] for multilingual and crosslingual recognition. Remarkably, the end-to-end CTC-CRF model, which is defined by a CRF (conditional random field) with CTC topology, has been shown to perform significantly better than CTC [20,21]. Moreover, mono-phone CTC-CRF performs comparably to bi-phone end-to-end LF-MMI [28] and avoids context-dependent modeling with a simpler pipeline, which is particularly attractive for multilingual and crosslingual speech recognition.…”
Section: Related Workmentioning
confidence: 99%
“…In this section, we briefly explain the CTC-CRF based framework [20,21] to use phone embeddings for ASR. Consider discriminative training with the objective to maximize the conditional likelihood [20]:…”
Section: Ctc-crf Based Asrmentioning
confidence: 99%
See 2 more Smart Citations