The increasing importance of Unicode for text files implies an increase in storage space required for data and the time for the transmission of data, with a corresponding need for compression of data. Conventional compressors fair purely on UTF-8 texts, where each character can span multiple bytes. Malayalam which is one among the four major languages of the Dravidian family, is represented by using Unicode characters. The contribution of this paper is a reversible transformation mapping of the input to reduce the actual size of the input file before a general purpose compression method. After the preprocessing, LZW compression achieves more compression to Malayalam text files containing any characters including ASCII characters. This method can be extended to any native language files containing mostly the characters of only one script.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.