Jonathan Mukiibi scite author profile

Jonathan Mukiibi

5Publications

35Citation Statements Received

80Citation Statements Given

How they've been cited

How they cite others

Affiliations

Makerere University

Publications

Order By: Most citations

MasakhaNER: Named Entity Recognition for African Languages

Adelani

Abbott²,

Neubig

et al. 2021

View full text Add to dashboard Cite

We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

show abstract

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

Adelani¹,

Alabi²,

Fan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, lowresource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pretrained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to finetune large pre-trained models on small quantities of high-quality translation data.

show abstract

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Adelani¹,

Neubig²,

Ruder³

et al. 2022

Preprint

View full text Add to dashboard Cite

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

Adelani¹,

Alabi²,

Fan³

et al. 2022

View full text Add to dashboard Cite

Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of highquality translation data.

show abstract

An English-Luganda parallel corpus

Mukiibi¹,

Babirye²,

Nakatumba‐Nabende³

2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.