We propose geometric weighting as a novel method to combine multiple models in data compression. Our results reveal the rationale behind PAQ-weighting and generalize it to a non-binary alphabet. Based on a similar technique we present a new, generic linear mixture technique. All novel mixture techniques rely on given weight vectors. We consider the problem of finding optimal weights and show that the weight optimization leads to a strictly convex (and thus, good-natured) optimization problem. Finally, an experimental evaluation compares the two presented mixture techniques for a binary alphabet. The results indicate that geometric weighting is superior to linear weighting.Comment: Data Compression Conference (DCC) 201
In this paper an approach to modelling nonstationary binary sequences, i.e., predicting the probability of upcoming symbols, is presented. After studying the prediction model we evaluate its performance in two non-artificial test cases. First the model is compared to the Laplace and Krichevsky-Trofimov estimators. Secondly a statistical ensemble model for compressing Burrows-Wheeler-Transform output is worked out and evaluated. A systematic approach to the parameter optimization of an individual model and the ensemble model is stated.
This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons are able to model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization. We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN learning mechanism possesses extraordinary resilience to catastrophic forgetting, performing almost on par to an MLP with dropout and Elastic Weight Consolidation on standard benchmarks.
Abstract. Probability estimation is an elementary building block of every statistical data compression algorithm. In practice probability estimation is often based on relative letter frequencies which get scaled down, when their sum is too large. Such algorithms are attractive in terms of memory requirements, running time and practical performance. However, there still is a lack of theoretical understanding. In this work we formulate a typical probability estimation algorithm based on relative frequencies and frequency discount, Algorithm RFD. Our main contribution is its theoretical analysis. We show that Algorithm RFD performs almost as good as any piecewise stationary model with either bounded or unbounded letter probabilities. This theoretically confirms the recency effect of periodic frequency discount, which has often been observed empirically.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.