Innovative Bert-Based Reranking Language Models for Speech Recognition

Chiu, Shih‐Hsuan; Chen, Berlin

doi:10.1109/slt48900.2021.9383557

Cited by 28 publications

(13 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each term P (w 1:t−1 , w t+1:T ) is the prior probability obtained by applying Eqn. (7) again over a token string obtained by removing the t th token from w 1:T . Therefore, Eqn.…”

Section: Converting Bidirectional Lm Output Probabilitiesmentioning

confidence: 99%

“…Therefore, Eqn. (7) provides a recursive procedure to convert the bidirectional LM output probabilities into the exact sentence prior probability, which is presented for the first time to the best of the authors' knowledge. A link between unidirectional and bidirectional LMs can be found by equating the right hand sides of Eqn.…”

Section: Converting Bidirectional Lm Output Probabilitiesmentioning

confidence: 99%

“…A solution is to build LMs with neural network (NN) models that can more reliably estimate sentence prior probabilities using longer contexts given a certain amount of text training data. Alternatively, additional out-of-domain data can be leveraged to improve LM training with limited in-domain data via LM adaptation and transfer learning [2][3][4][5][6][7][8].…”

Section: Introductionmentioning

confidence: 99%

“…Despite the wide-spread application of GPT and BERT in NLP and machine learning, there are only a very limited number of studies on their use in ASR [5][6][7][8]. In this paper, we present ASR results obtained using GPT and GPT-2 that are fine-tuned on in-domain data.…”

Section: Introductionmentioning

confidence: 99%

“…"FT" indicates if the pre-trained model is fine-tuned on in-domain data. "Ours" applies Eqn (7). with different values of M .…”

mentioning

confidence: 99%

See 4 more Smart Citations

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Zheng¹,

Zhang²,

Woodland³

2021

Preprint

View full text Add to dashboard Cite

Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures trained from scratch on the indomain text by up to a 12% relative word error rate reduction (WERR). Furthermore, the proposed conversion for language prior probabilities enables BERT to receive an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.

show abstract

“…Each term P (w 1:t−1 , w t+1:T ) is the prior probability obtained by applying Eqn. (7) again over a token string obtained by removing the t th token from w 1:T . Therefore, Eqn.…”

Section: Converting Bidirectional Lm Output Probabilitiesmentioning

confidence: 99%

Section: Converting Bidirectional Lm Output Probabilitiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…"FT" indicates if the pre-trained model is fine-tuned on in-domain data. "Ours" applies Eqn (7). with different values of M .…”

mentioning

confidence: 99%

See 3 more Smart Citations

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Zheng¹,

Zhang²,

Woodland³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

Min,

Wang

2023

Communications in Computer and Information Science

View full text Add to dashboard Cite

Deep neural architecture for natural language image synthesis for Tamil text using BASEGAN and hybrid super resolution GAN (HSRGAN)

Diviya,

Karmel

2023

Sci Rep

View full text Add to dashboard Cite

Tamil is a language that has the most extended history and is a conventional language of India. It has antique origins and a distinct tradition. A study reveals that at the beginning of the twenty-first century, more than 66 million people spoke Tamil. In the present time, image synthesis from text emerged as a promising advancement in computer vision applications. The research work done so far in intelligent systems is trained in universal language but still has not achieved the desired development level in regional languages. Regional languages have a greater scope for developing applications and will enhance more research areas to be explored, ruling out the barrier. The current work using Auto Encoders failed at the point of providing vivid information along with essential descriptions of the synthesised images. The work aims to generate embedding vectors using a language model headed by image synthesis using GAN (Generative Adversarial Network) architecture. The proposed method is divided into two stages: designing a language model TBERTBASECASE model for generating embedding vectors. Synthesising images using Generative Adversarial Network called BASEGAN, the resolution has been improved through two-stage architecture named HYBRID SUPER RESOLUTION GAN. The work uses Oxford-102 and CUB-200 datasets. The framework efficiency has been measured using F1 Score, Fréchet inception distance (FID), and Inception Score (IS). Language and image synthesis architecture proposed can bridge the gap between the research ideas in regional languages.

show abstract

Innovative Bert-Based Reranking Language Models for Speech Recognition

Cited by 28 publications

References 30 publications

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

Deep neural architecture for natural language image synthesis for Tamil text using BASEGAN and hybrid super resolution GAN (HSRGAN)

Contact Info

Product

Resources

About