Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods

Mahbub, Mahir; Akhter, Suravi; Kabir, Ahmedul; Begum, Zareena

doi:10.3329/dujase.v7i2.65088

Dhaka Uni. J. of Applied Sci. and Eng.

2023

DOI: 10.3329/dujase.v7i2.65088

|View full text |Cite

Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods

Mahir Mahbub

Suravi Akhter

Ahmedul Kabir

et al.

Abstract: Next word prediction is a helpful feature for various typing subsystems. It is also convenient to have suggestions while typing to speed up the writing of digital documents. Therefore, researchers over time have been trying to enhance the capability of such a prediction system. Knowledge regarding the inner meaning of the words along with the contextual understanding of the sequence can be helpful in enhancing the next word prediction capability. Theoretically, these reasonings seem to be very promising. With … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT

Al-Anzi,

Shalini

2024

Applied Sciences

View full text Add to dashboard Cite

Arabic raw audio datasets were initially gathered to produce a corresponding signal spectrum, which was further used to extract the Mel-Frequency Cepstral Coefficients (MFCCs). The pronunciation dictionary, language model, and acoustic model were further derived from the MFCCs’ features. These output data were processed into Baidu’s Deep Speech model (ASR system) to attain the text corpus. Baidu’s Deep Speech model was implemented to precisely identify the global optimal value rapidly while preserving a low word and character discrepancy rate by attaining an excellent performance in isolated and end-to-end speech recognition. The desired outcome in this work is to forecast the next word and character in a sequential and systematic order that applies under natural language processing (NLP). This work combines the trained Arabic language model ARABERT with the potential of Long Short-Term Memory (LSTM) networks to predict the next word and character in an Arabic text. We used the pre-trained ARABERT embedding to improve the model’s capacity and, to capture semantic relationships within the language, we educated LSTM + CNN and Markov models on Arabic text data to assess the efficacy of this model. Python libraries such as TensorFlow, Pickle, Keras, and NumPy were used to effectively design our development model. We extensively assessed the model’s performance using new Arabic text, focusing on evaluation metrics like accuracy, word error rate, character error rate, BLEU score, and perplexity. The results show how well the combined LSTM + ARABERT and Markov models have outperformed the baseline models in envisaging the next word or character in the Arabic text. The accuracy rates of 64.9% for LSTM, 74.6% for ARABERT + LSTM, and 78% for Markov chain models were achieved in predicting the next word, and the accuracy rates of 72% for LSTM, 72.22% for LSTM + CNN, and 73% for ARABERET + LSTM models were achieved for the next-character prediction. This work unveils a novelty in Arabic natural language processing tasks, estimating a potential future expansion in deriving a precise next-word and next-character forecasting, which can be an efficient utility for text generation and machine translation applications.

show abstract

Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT

Al-Anzi,

Shalini

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods

Cited by 1 publication

References 25 publications

Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT

Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT

Contact Info

Product

Resources

About