Arithmetic coding is a very efficient and most popular entropy coding technique. Arithmetic coding method is based on the fact that the cumulative probability of a symbol sequence corresponds to a unique subinterval of the initial interval [0, 1). In this method, when encoding a symbol, it first computes new interval [low, high) based on cumulative probability segment of the symbol. Thereafter it iterates in a loop to output code bits till the interval becomes 2 b-2 wide, where b is number of bits used to store range of an interval. In conventional implementation of arithmetic coding algorithm, in single loop iteration, only one bit is processed at a time. When most significant bit (msb) of low and high of a subinterval matches, it writes this msb in coded message and doubles the interval by extracting msb. When underflow occurs, it extracts second msb to expand an interval. Processing such single bit and expanding an interval is also called renormalization in a loop. In this paper, an upgradation of this conventional arithmetic coding algorithm is proposed, wherein more than one bit is processed at a time instead of just single bit in single iteration. This proposed multi-bit processing arithmetic coding algorithm is implemented here to reduce the iterations needed in renormalizing an interval. It is observed that processing multiple output bits at a time leads to big improvement in execution time. To determine the number of maximum possible matching most significant bits to output, two alternatives are used here; (i) Using shift operation in a loop (ii) Using log function. It is found that first technique is far better than second one with respect to execution time. As compared to conventional implementations processing single bit at a time, about 52% overall saving in execution time is observed when processing multi-bits using shift operation in a loop; whereas about 31% overall loss in performance is observed with the technique of using log function. We have also tried these two alternative ways to determine the number of consecutive occurrences of underflow and process them all in single iteration; but it has not shown any significant gain in speed. As expected, in using any of the above methods, there is no compromise in compression ratio.
Arithmetic coding is used in many compression techniques during the entropy encoding stage. Further compression is not possible without changing the data model and increasing redundancy in the data set. To increase the redundancy, we have applied index based byte-pair transformation (BPT
Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zero-frequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-frequency bytes to represent pairs. It writes out substitution information before the encoded data in each pass. This algorithm is very time consuming as it requires to determine most frequent byte-pair in each pass before starting substitution. We have proposed kpass byte-pair transformation algorithm where k may be very very small as compared to maximum possible passes m. Our aim is to minimize the compression time and achieve equvivalent compression rate. Proposed algorithm transforms half of the possible most-frequent byte pairs in each pass except the last. In the last pass, it transforms all remaining possible byte-pairs. This reduced number of passes save the time taken in computing frequency of byte-pairs in maximum m passes. Experimental results have shown that proposed algorithm had taken 3. 213, 9.794, 13.324, 16.323, 22.388 seconds with 1, 2, 3, 4 and 6 passes respectively as compared to 295.642 seconds of m-passes. Compression rate achieved due to transformation is 14.72%, 20.12%, 21.89%, 22.67% and 22.96% with 1, 2, 3, 4 and 6 passes respectively as compared to 25.55% using maximum m-passes. As the number of passes increases, compression is better with increased execution time. Our aim of achieving speed is achieved with little loss in compression rate.
A new data structure, namely "cumulative frequency matrix (CFM)", is proposed here for maintaining cumulative frequencies. For an order-0 model having 256 symbols, CFM is a 2-D array of 16 rows and 16 columns. Two nibbles, say L for left and R for right, of a byte symbol represents row and column dimensions respectively. Matrix element (L, R) represents cumulative frequency of symbol with right nibble as R among symbols with left nibble as L. Within row, it stores cumulative frequency of symbols with right nibble varying from 0 to 15. Adaptive arithmetic coding is a lossless data compression method. It needs to update cumulative frequencies at runtime. Various algorithms for maintaining cumulative frequencies, computing cumulative frequency interval etc. are discussed here. Practical implementation shows that proposed data structure is simpler as well as efficient as compared to other data structures in use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.