In this paper, we take Indonesian as the research object, and propose a multiple filter correction framework (MFCF). The main idea of MFCF is to remove noise from candidate words to increase the probability of correct words being selected. In MFCF, we use window search algorithm (WSA) to filter the candidate words in the dictionary. When searching for candidate words whose Levenshtein distance is 1, WSA reduces the candidate word search space by an average of 71%. When searching for candidate words whose Levenshtein distance is 2, the search space is reduced by an average of 55%. The reduction in search space has brought about an increase in search speed. When WSA searches for candidate words with Levenshtein distance equal to 1 and 2, the speed exceeds the current advanced search algorithm. A character vector-based candidate word scoring model (CWSM-CV) is also introduced in this paper. CWSM-CV is a simple but unsupervised method. In MFCF, we use CWSM-CV to filter the correct word in the candidate word list. Through exploring the feasibility of using word vector-based candidate word scoring model to score candidate words (CWSM-WV), we find the necessity of denoising the candidate word list and verified it with experiments. In order to apply this finding to the text correction, a new set of evaluation indicators are proposed to replace accuracy. Finally, we recommend that researchers who correct text in low-resource languages make the model an open system and publish it for users to use. The system receives user feedback as new data to gradually reduce the negative impact of data volume.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.