Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.
We study the approximate message-passing decoder for sparse superposition coding on the additive white Gaussian noise channel and extend our preliminary work [1]. We use heuristic statistical-physics-based tools such as the cavity and the replica methods for the statistical analysis of the scheme. While superposition codes asymptotically reach the Shannon capacity, we show that our iterative decoder is limited by a phase transition similar to the one that happens in Low Density Parity check codes. We consider two solutions to this problem, that both allow to reach the Shannon capacity: i) a power allocation strategy and ii) the use of spatial coupling, a novelty for these codes that appears to be promising. We present in particular simulations suggesting that spatial coupling is more robust and allows for better reconstruction at finite code lengths. Finally, we show empirically that the use of a fast Hadamard-based operator allows for an efficient reconstruction, both in terms of computational time and memory, and the ability to deal with very large messages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.