Sun, Sining scite author profile

Sun, Sining

5Publications

0Citation Statements Received

51Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Zhang¹,

Sining²,

Xie³

et al. 2022

View full text Add to dashboard Cite

Automatic Speech Recognition (ASR) in conversational settings presents unique challenges, including extracting relevant contextual information from previous conversational turns. Due to irrelevant content, error propagation, and redundancy, existing methods struggle to extract longer and more effective contexts. To address this issue, we introduce a novel Conversational ASR system, extending the Conformer encoderdecoder model with cross-modal conversational representation. Our approach leverages a cross-modal extractor that combines pre-trained speech and text models through a specialized encoder and a modal-level mask input. This enables the extraction of richer historical speech context without explicit error propagation. We also incorporate conditional latent variational modules to learn conversational level attributes such as role preference and topic coherence. By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8.8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model.

show abstract

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Zhang¹,

Sining²,

Xie³

et al. 2022

Preprint

View full text Add to dashboard Cite

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Zhanheng¹,

Sining²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer

Zhanheng¹,

Sining²,

Wang³

et al. 2023

View full text Add to dashboard Cite

Conversational Speech Recognition By Learning Conversation-level Characteristics

Zhang¹,

Sining²,

Xie³

et al. 2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sun, Sining

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer

Conversational Speech Recognition By Learning Conversation-level Characteristics

Contact Info

Product

Resources

About