ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding

Wang, Chengyu; Dai, Suyang; Wang, Yipeng; Yang, Fei; Qiu, Minghui; Chen, Kehan; Zhou, Wei; Huang, Jun

doi:10.1109/taslp.2022.3153268

Cited by 15 publications

(5 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SLU is an essential task for machines to infer correct semantic meaning (e.g., intent detection, slot filling) from human speech Zhou et al, 2020;Cheng et al, 2023b,d,e;. Traditionally, it can be solved by fine-tuning NLP models (especially PLMs) with the ASR hypothesis as input (Wang et al, 2022a). However, the ASR hypothesis often contains errors caused by ASR systems.…”

Section: Related Workmentioning

confidence: 99%

MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding

Huang,

Chen,

Zhu

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Enhancing the robustness towards Automatic Speech Recognition (ASR) errors is of great importance for Spoken Language Understanding (SLU). Trending ASR-robust SLU systems have witnessed impressive improvements through global contrastive learning. However, although most ASR errors occur only at local positions of utterances, they can easily lead to severe semantic changes, and utterance-level classification or comparison is difficult to distinguish such differences. To address the problem, we propose a two-stage multi-grained contrastive learning framework dubbed MCLF. Technically, we first adapt the pre-trained language models to downstream SLU datasets via the proposed multi-grained contrastive learning objective and then fine-tune it on the corresponding dataset. Besides, to facilitate contrastive learning in the pre-training stage, we explore several data augmentation methods to expand the training data. Experimental results and detailed analyses on four datasets and four BERT-like backbone models demonstrate the effectiveness of our approach.

show abstract

Section: Related Workmentioning

confidence: 99%

MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding

Huang,

Chen,

Zhu

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…The Transformer model [3,[19][20][21][22][23] is modeled and applied to natural language processing tasks using only self-attentive mechanisms.…”

Section: Transformermentioning

confidence: 99%

“…Pre-training models have shown great promise in natural language processing, with the Transformer model [1] proposing an encoder-decoder architecture based solely on the self-attention mechanism, enabling the construction of large-scale models that can be pretrained on vast amounts of data. Language models [2][3][4] can be broadly categorized into two types: autoregressive language modeling and autoencoder language modeling. autoregressive language models, such as ELMO [5], GPT [6], and T5 [7], predict the next possible word based on the preceding context, making them well-suited for generative tasks.…”

Section: Introductionmentioning

confidence: 99%

EvoText: Enhancing Natural Language Generation Models via Self-Escalation Learning for Up-to-Date Knowledge and Improved Performance

et al. 2023

View full text Add to dashboard Cite

In recent years, pretrained models have been widely used in various fields, including natural language understanding, computer vision, and natural language generation. However, the performance of these language generation models is highly dependent on the model size and the dataset size. While larger models excel in some aspects, they cannot learn up-to-date knowledge and are relatively difficult to relearn. In this paper, we introduce EvoText, a novel training method that enhances the performance of any natural language generation model without requiring additional datasets during the entire training process (although a prior dataset is necessary for pretraining). EvoText employs two models: G, a text generation model, and D, a model that can determine whether the data generated by G is legitimate. Initially, the fine-tuned D model serves as the knowledge base. The text generated by G is then input to D to determine whether it is legitimate. Finally, G is fine-tuned based on D’s output. EvoText enables the model to learn up-to-date knowledge through a self-escalation process that builds on a priori knowledge. When EvoText needs to learn something new, it simply fine-tunes the D model. Our approach applies to autoregressive language modeling for all Transformer classes. With EvoText, eight models achieved stable improvements in seven natural language processing tasks without any changes to the model structure.

show abstract

“…Since transformer architecture was first introduced in a 2017 paper titled "Attention is All You Need" by Vaswani et al, which used a self-attention mechanism to compute contextualized word embeddings. Large Language Model or language model with transformers have been applied in numerous NLU tasks across a wide range of business areas or industries, such as building chatbots [9,10,11], improving product recommendations [12], analyzing financial reports or news articles [13,14] as well as the increasing LLM applications in healthcare [15], which enhanced the efficiency of medical resources allocation and provided appropriate medical services to patients. Its exploration in USMLE patient notes automatic scoring has also achieved significant…”

Section: Introductionmentioning

confidence: 99%

Ensemble DeBERTa Models on USMLE Patient Notes Automatic Scoring using Note-based and Character-based approaches

Long¹,

Tan²,

Newman³

2023

AETR

View full text Add to dashboard Cite

To maximize the accuracy and efficiency of the USMLE Step 2 clinical skills examination evaluation process, we proposed an ensemble model that helps automatically score patient notes written by test takers instead of physician raters manually scoring them by appropriate features. This research used DeBERTa-base, DeBERTa-large, and DeBERTa-v3-large as three base models and ensembled them with two different approaches: Note-based and Character-based. We concluded that LSTM Note-based ensemble topped the overall performance with an F1-score of 0.81747 on the validation data, 48% higher than the F1-score of the most effective base model (DeBERTa-v3-large). Furthermore, the performance is robust when breakdown by clinical cases and folds and applied to the testing set (0.88737 accuracy). Finally, the ensemble approach to different base models (BERT-base-uncased and BERT-large-uncased) achieved a 32% F-1 score boost. We demonstrated the ensemble model has excellent potential to improve performance in general Natural Language Understanding tasks.

show abstract

ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding

Cited by 15 publications

References 40 publications

MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding

MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding

EvoText: Enhancing Natural Language Generation Models via Self-Escalation Learning for Up-to-Date Knowledge and Improved Performance

Ensemble DeBERTa Models on USMLE Patient Notes Automatic Scoring using Note-based and Character-based approaches

Contact Info

Product

Resources

About