ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
DOI: 10.1109/icassp49357.2023.10095115
|View full text |Cite
|
Sign up to set email alerts
|

Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio

Abstract: We propose CONF-TSASR, a non-autoregressive end-toend time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The model consists of a TitaNet based speaker embedding module, a Conformer based masking as well as ASR modules. These modules are jointly optimized to transcribe a targetspeaker, while ignoring speech from other speakers. For training we use Connectionist Temporal Classification (CTC) loss and introduce a scale-invariant spectrogram reconstruction l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
references
References 23 publications
0
0
0
Order By: Relevance