Bodo Resources for NLP - An Overview of Existing Primary Resources for Bodo

Narzary, Mwnthai; Muchahary, Gwmsrang; Brahma, Maharaj; Narzary, Sanjib; Singh, Pranav Kumar; Senapati, Apurbalal

doi:10.21467/proceedings.115.12

Cited by 5 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, we've established specific strict guidelines. These regulations are based on the community standards of ™Facebook 10 and ™YouTube 11 . Comments with the following aims should be marked as hate.…”

Section: Dataset Annotationmentioning

confidence: 99%

“…Indian government law is also introduced against hate speech [6]. Several social media platforms revised their community guidelines to eradicate hate, automatically detecting hate comments and posts and giving users access to report posts and comments 12 . English and other popular languages benefit from their global popularity.…”

Section: Introductionmentioning

confidence: 99%

“…As an associate official language of the Indian state of Assam, Bodo is widely spoken in the Bodoland Territorial Region 3 . Among the official languages of India, it has gained some recognition 4 [11]. The 2011 Indian Census 5 estimates a total of 1,482,929 Bodo speakers, including 1,454,547 native speakers.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models

Ghosh,

Senapati,

Narzary

et al. 2023

SCPE

View full text Add to dashboard Cite

Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.

show abstract

Section: Dataset Annotationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models

Ghosh,

Senapati,

Narzary

et al. 2023

SCPE

View full text Add to dashboard Cite

show abstract

“…Under this wave a new dawn began with the growing socio-political consciousness among a handful of enlighten Bodos. This came at a time, when formidable size of the Bodos had already renounced their ancestral traditions and ethnicity by adopting Assamese identity to avoid social discrimination (Narzary B. , 2007). Furthermore, the British policy to open the doors of Assam for outsiders to fill the needs of colonial administration have allowed free flow of people from rest of India and neighboring countries.…”

Section: Rise Of Bodo Identity and Political Renaissance Under Gurude...mentioning

confidence: 99%

Ethnic Assertion and Electoral Politics of the Bodo Tribe in Assam

Basumatary,

Mushahary

2023

E3S Web Conf.

View full text Add to dashboard Cite

Ethnic assertion in North East India is widely known political phenomenon. Bodos are one of the largest ethnic groups in North East India, who have asserted for Bodo homeland in Assam for a long time. This paper is an attempt to make a critical analysis of various stages of the Bodo assertion, the rise of Bodo identity consciousness, sub-nationalism, evolution of the Bodo politics and simultaneous participation in electoral politics since Colonial era under different organizations and leadership of the time. Beginning from the first generation Bodo leaders in the first half of the 20th century to the post Independence era and beyond. The study is limited and emphasis only the Bodo assertion and their electoral participation within the limited political and territorial framework of BTC/BTR. BTC/BTR comprises of four districts of Assam i.e. Kokrajhar, Chirang, Baksa and Udalguri. It is administered under the Sixth Schedule of the Constitution of India. It enjoys limited jurisdiction of legislative and executive power under the constitutional framework.This study is analytical in nature includes an observation and critical analysis of the secondary sources. The study has also referred the primary sources for electoral data and statistic from various official sources and records.

show abstract

“…The text of the raw corpus is from different domains such as Aesthetics (Culture, Cinema, Literature, Biographies, and Folklore), Commerce, Mass media (Classified, Discussion, Editorial, Sports, General news, Health, Weather, and Social), Science and Technology (Agriculture, Environmental Science, Textbook, Astrology, Mechanical Engineering, and Environmental Science) and Social Sciences (Economics, Education, Political Science, Linguistics, Health and Family Welfare, History, Text Book, Law, etc). We also acquired another corpus from the work (Narzary et al 2022). The final consolidated corpus has 1.6 million tokens and 191k sentences.…”

Section: Introductionmentioning

confidence: 99%

Part-of-speech tagger for Bodo language using deep learning approach

Pathak,

Narzary,

Nandi

et al. 2024

Nat. lang. processing

View full text Add to dashboard Cite

Language processing systems such as part-of-speech (POS) tagging, named entity recognition, machine translation, speech recognition, and language modeling have been well-studied in high-resource languages. Nevertheless, research on these systems for several low-resource languages, including Bodo, Mizo, Nagamese, and others, is either yet to commence or is in its nascent stages. Language model (LM) plays a vital role in the downstream tasks of modern natural language processing. Extensive studies are carried out on LMs for high-resource languages. However, these low-resource languages are still underreprese. In this study, we first present BodoBERT, an LM for the Bodo language. To the best of our knowledge, this work is the first such effort to develop an LM for Bodo. Second, we present an ensemble deep learning-based POS tagging model for Bodo. The POS tagging model is based on combinations of BiLSTM with conditional random field and stacked embedding of BodoBERT with BytePairEmbeddings. We cover several LMs in the experiment to see how well they work in POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A comparative experiment was also conducted on Assamese POS taggers, considering that the language is spoken in the same region as Bodo.

show abstract

Bodo Resources for NLP - An Overview of Existing Primary Resources for Bodo

Cited by 5 publications

References 0 publications

Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models

Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models

Ethnic Assertion and Electoral Politics of the Bodo Tribe in Assam

Part-of-speech tagger for Bodo language using deep learning approach

Contact Info

Product

Resources

About