Purpose -The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved pattern matching (PM) technique. Design/methodology/approach -In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi-symbol QM-coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness. Findings -Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6-3 times in the lossy case and 1.3-2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images. Research limitations/implications -Only the binary printed typeset textual images are considered. Practical implications -The proposed method has a high-compression ratio for archiving and storage applications. Originality/value -To the authors' best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run-length and arithmetic coding techniques has not so far been used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.