Joseph Marvin Imperial scite author profile

Joseph Marvin Imperial

5Publications

11Citation Statements Received

57Citation Statements Given

How they've been cited

How they cite others

Affiliations

National University, De La Salle University, University of the Philippines Manila

Publications

Order By: Most citations

A Baseline Readability Model for Cebuano

Imperial¹,

Reyes²,

Ibanez³

et al. 2022

View full text Add to dashboard Cite

In this study, we developed the first baseline readability model for the Cebuano language. Cebuano is the second most-used native language in the Philippines with about 27.5 million speakers. As the baseline, we extracted traditional or surface-based features, syllable patterns based from Cebuano's documented orthography, and neural embeddings from the multilingual BERT model. Results show that the use of the first two handcrafted linguistic features obtained the best performance trained on an optimized Random Forest model with approximately 87% across all metrics. The feature sets and algorithm used also is similar to previous results in readability assessment for the Filipino language-showing potential of crosslingual application. To encourage more work for readability assessment in Philippine languages such as Cebuano, we open-sourced both code and data 1 .

show abstract

BERT Embeddings for Automatic Readability Assessment

Imperial¹

2021

View full text Add to dashboard Cite

In recent years, the main focus of research on automatic readability assessment (ARA) has shifted towards using expensive deep learningbased methods with the primary goal of increasing models' accuracy. This, however, is rarely applicable for low-resource languages where traditional handcrafted features are still widely used due to the lack of existing NLP tools to extract deeper linguistic representations. In this work, we take a step back from the technical component and focus on how linguistic aspects such as mutual intelligibility or degree of language relatedness can improve ARA in a low-resource setting. We collect short stories written in three languages in the Philippines -Tagalog, Bikol, and Cebuano -to train readability assessment models and explore the interaction of data and features in various crosslingual setups. Our results show that the inclusion of CROSSNGO, a novel specialized feature exploiting n-gram overlap applied to languages with high mutual intelligibility, significantly improves the performance of ARA models compared to the use of off-the-shelf large multilingual language models alone. Consequently, when both linguistic representations are combined, we achieve state-of-the-art results for Tagalog and Cebuano, and baseline scores for ARA in Bikol.

show abstract

An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature

Imperial

Ya-on

Ureta

2019

View full text Add to dashboard Cite

Doctor’s Cursive Handwriting Recognition System Using Deep Learning

Fajardo

Sorillo

Garlit

et al. 2019

View full text Add to dashboard Cite

A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos

Urbano

Ajero

Angeles

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.