Abstract-Detecting code-switching points is important, especially with the increasing globalism and multilingualism. However, this is a challenging task, but with the help of computers and technology, this can be done easily. In this paper, an approach to effectively detect code-switching points in a Tagalog-English text input, especially those with alternating English and Tagalog words, is presented. The approach uses the frequency counts of word bigrams and unigrams from language models which were trained from an existing and available corpus. For the testing, 3 test data categories were used -twitter posts, conversations, and short stories. The test data were composed of a total of 3088 English and Tagalog words. The results show that the system's accuracy of properly identifying English and Tagalog words ranged from 81% -95%, while the F-measure ranged from 72% -95%. The research can be extended and improved using other n-grams, stemming, and searching algorithms. IndexTerms-Code-switching point detection, intra-sentential code-switching, word bigram, word unigram.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.