2022
DOI: 10.1109/access.2022.3197662
|View full text |Cite
|
Sign up to set email alerts
|

Bangla-BERT: Transformer-Based Efficient Model for Transfer Learning and Language Understanding

Abstract: The advent of pre-trained language models has directed a new era of Natural Language Processing (NLP), enabling us to create powerful language models. Among these models, Transformer-based models like BERT have grown in popularity due to their cutting-edge effectiveness. However, these models heavily rely on resource-intensive languages, forcing other languages into multilingual models(mBERT). The two fundamental challenges with mBERT become significantly more challenging in a resource-constrained language lik… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(26 citation statements)
references
References 45 publications
0
25
0
1
Order By: Relevance
“…The model receives an input sentence X represented as a sequence of tokens X = x 1 , ..., x s and outputs a single semantic vector [CLS] of each sentence. Before processing the raw sentence having tokens, DistilBERT applies a sequence of sub-word tokenization and word-piece tokenization [68] to produce a set of embedding vectors named input embedding (S1). The tokenization maps each token to three different embeddings: word, segment, and positional.…”
Section: Pre-processing Of Main Features (Key Features)mentioning
confidence: 99%
“…The model receives an input sentence X represented as a sequence of tokens X = x 1 , ..., x s and outputs a single semantic vector [CLS] of each sentence. Before processing the raw sentence having tokens, DistilBERT applies a sequence of sub-word tokenization and word-piece tokenization [68] to produce a set of embedding vectors named input embedding (S1). The tokenization maps each token to three different embeddings: word, segment, and positional.…”
Section: Pre-processing Of Main Features (Key Features)mentioning
confidence: 99%
“…The architecture is able to improve 2.50% accuracy on PolitiFact dataset, and the improvement for GossipCop dataset is 1.10%. To break BERT's language constraints in [13] authors proposed Bangla-BERT. Here, they are able to increase the results by at most 5.3%.…”
Section: Related Workmentioning
confidence: 99%
“…However, in contrast to an RNN, it can process the entire input all at once resulting in an increased training speed. Another distinguishing aspect of transformers is their ability to quickly adapt to other tasks they have not been trained on, i.e., transfer learning [145]. Due to the promising capabilities of learning long-term dependencies and transfer learning, transformers can be leveraged to deal with the high mobility scenario of BM.…”
Section: E High Mobilitymentioning
confidence: 99%