Next word prediction is a helpful feature for various typing subsystems. It is also convenient to have suggestions while typing to speed up the writing of digital documents. Therefore, researchers over time have been trying to enhance the capability of such a prediction system. Knowledge regarding the inner meaning of the words along with the contextual understanding of the sequence can be helpful in enhancing the next word prediction capability. Theoretically, these reasonings seem to be very promising. With the advancement of Natural Language Processing (NLP), these reasonings are found to be applicable in real scenarios. NLP techniques like Word embedding and sequential contextual modeling can help us to gain insight into these points. Word embedding can capture various relations among the words and explain their inner knowledge. On the other hand, sequence modeling can capture contextual information. In this paper, we figure out which embedding method works better for Bengali next word prediction. The embeddings we have compared are word2vec skip-gram, word2vec CBOW, fastText skip-gram and fastText CBOW. We have applied them in a deep learning sequential model based on LSTM which was trained on a large corpus of Bengali texts. The results reveal some useful insights about the contextual and sequential information gathering that will help to implement a context-based Bengali next word prediction system.
DUJASE Vol. 7 (2) 8-15, 2022 (July)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.