Search citation statements
Paper Sections
Citation Types
Publication Types
Relationship
Authors
Journals
In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses redundancy in terms of the total variation in probability mass of the stationary distributions of a Piecewise Stationary Source. By choosing parameters appropriately Probability Smoothing has redundancy O(S · √ T log T ) for sequences of length T with respect to a Piecewise Stationary Source with S segments. arXiv:1712.02151v2 [cs.IT] 10 Jan 2018 and hence that it is one of the main drivers of PAQ's practical success [9].Previous Work. Major approaches to EMs can roughly be divided into two groups, namely frequency-based EMs and probability-based EMs. Approaches not fitting in these two categories are Finite State Machines [4,10,11,14] and Weighted Transition Diagrams [18,20,21,22]. Discussing these in greater detail is beyond the scope of this paper, for a survey see [9]. In the following redundancy "O(S · . . . )" is w. r. t. PWSs with S segments, redundancy "O(. . . )" (no dependency on S) is w. r. t. a Stationary Source (PWS with S = 1) and T is the sequence length.Frequency-based EMs maintain approximate letter frequencies online and predict by forming relative frequencies. Typically these approximate letter frequencies are recency-weighted to give more emphasis to recent observations. In practice this often improves compression. Recencyweighting can be achieved by maintaining frequencies over a sliding window [11,15,16], by resetting frequencies [18] or by scaling down frequencies at regular intervals [3,8]. The redundancy of these approaches typically ranges from O(S · √ T log T ) to O(S · √ T log T ). Probability-based EMs work by directly maintaining a distribution online. A popular example is PS which was used by most members of the PAQ family of statistical data compression algorithms. Given a PS prediction p (from the previous step or from initialization) and a new letter y PS first shrinks the probability of any letter x to α ·p(x) and then increases the probability of y by 1−α. For a binary alphabet this approach has redundancy O( √ T ) [7]. An extension is to introduce a (probability) share factor parameter ε [6]: After probability shrinking the probability of y is increased by (1 − α) · (1 − ε) and the probability of all other observations is uniformly increased by (1 − α) · ε N−1 ([6] only considered N = 2). Experiments on real-world data indicate that setting ε > 0 improves compression [6]. Note that this approach is related to share-based expert tracking [2,12] and to observation uncertainty [6]. Another approach is to apply online convex programming to estimate distributions, for example a method based on Online Mirror Descent leads to redundancy O(log T ) [13].Our Contribution. In this work we present ...
In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses redundancy in terms of the total variation in probability mass of the stationary distributions of a Piecewise Stationary Source. By choosing parameters appropriately Probability Smoothing has redundancy O(S · √ T log T ) for sequences of length T with respect to a Piecewise Stationary Source with S segments. arXiv:1712.02151v2 [cs.IT] 10 Jan 2018 and hence that it is one of the main drivers of PAQ's practical success [9].Previous Work. Major approaches to EMs can roughly be divided into two groups, namely frequency-based EMs and probability-based EMs. Approaches not fitting in these two categories are Finite State Machines [4,10,11,14] and Weighted Transition Diagrams [18,20,21,22]. Discussing these in greater detail is beyond the scope of this paper, for a survey see [9]. In the following redundancy "O(S · . . . )" is w. r. t. PWSs with S segments, redundancy "O(. . . )" (no dependency on S) is w. r. t. a Stationary Source (PWS with S = 1) and T is the sequence length.Frequency-based EMs maintain approximate letter frequencies online and predict by forming relative frequencies. Typically these approximate letter frequencies are recency-weighted to give more emphasis to recent observations. In practice this often improves compression. Recencyweighting can be achieved by maintaining frequencies over a sliding window [11,15,16], by resetting frequencies [18] or by scaling down frequencies at regular intervals [3,8]. The redundancy of these approaches typically ranges from O(S · √ T log T ) to O(S · √ T log T ). Probability-based EMs work by directly maintaining a distribution online. A popular example is PS which was used by most members of the PAQ family of statistical data compression algorithms. Given a PS prediction p (from the previous step or from initialization) and a new letter y PS first shrinks the probability of any letter x to α ·p(x) and then increases the probability of y by 1−α. For a binary alphabet this approach has redundancy O( √ T ) [7]. An extension is to introduce a (probability) share factor parameter ε [6]: After probability shrinking the probability of y is increased by (1 − α) · (1 − ε) and the probability of all other observations is uniformly increased by (1 − α) · ε N−1 ([6] only considered N = 2). Experiments on real-world data indicate that setting ε > 0 improves compression [6]. Note that this approach is related to share-based expert tracking [2,12] and to observation uncertainty [6]. Another approach is to apply online convex programming to estimate distributions, for example a method based on Online Mirror Descent leads to redundancy O(log T ) [13].Our Contribution. In this work we present ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.