Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10326
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Abstract: Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel Conversational ASR system, extending the Conformer encoderdecoder model with cross-modal conversational representation. Our approach leverages a cross-modal … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
references
References 34 publications
0
0
0
Order By: Relevance