We prove that, at least for the binary erasure channel, the polar-coding paradigm gives rise to codes that not only approach the Shannon limit but, in fact, do so under the best possible scaling of their block length as a function of the gap to capacity. This result exhibits the first known family of binary codes that attain both optimal scaling and quasi-linear complexity of encoding and decoding. Specifically, for any fixed δ > 0, we exhibit binary linear codes that ensure reliable communication at rates within ε > 0 of capacity with block length n = O(1/ε 2+δ ), construction complexity Θ(n), and encoding/decoding complexity Θ(n log n).Our proof is based on the construction and analysis of binary polar codes with large kernels. It was recently shown that, for all binary-input symmetric memoryless channels, conventional polar codes (based on a 2 × 2 kernel) allow reliable communication at rates within ε > 0 of capacity with block length, construction, encoding and decoding complexity all bounded by a polynomial in 1/ε. In particular, this means that the block length n scales as O(1/ε µ ), where the constant µ is called the scaling exponent. It is furthermore known that the optimal scaling exponent is µ = 2, and it is achieved by random linear codes. However, for general channels, the decoding complexity of random linear codes is exponential in the block length. As far as conventional polar codes, their scaling exponent depends on the channel, and for the binary erasure channel it is given by µ = 3.63. This falls far short of the optimal scaling guaranteed by random codes.Our main contribution is a rigorous proof of the following result: there exist ℓ × ℓ binary kernels, such that polar codes constructed from these kernels achieve scaling exponent µ(ℓ) that tends to the optimal value of 2 as ℓ grows. We furthermore characterize precisely how large ℓ needs to be as a function of the gap between µ(ℓ) and 2. The resulting binary codes maintain the beautiful recursive structure of conventional polar codes, and thereby achieve construction complexity Θ(n) and encoding/decoding complexity Θ(n log n). This implies that block length, construction, encoding, and decoding complexity are all linear or quasi-linear in 1/ε 2 , which meets the information-theoretic lower bound.
We consider the problem of decentralized consensus optimization, where the sum of n smooth and strongly convex functions are minimized over n distributed agents that form a connected network. In particular, we consider the case that the communicated local decision variables among nodes are quantized in order to alleviate the communication bottleneck in distributed optimization. We propose the Quantized Decentralized Gradient Descent (QDGD) algorithm, in which nodes update their local decision variables by combining the quantized information received from their neighbors with their local information. We prove that under standard strong convexity and smoothness assumptions for the objective function, QDGD achieves a vanishing mean solution error under customary conditions for quantizers. To the best of our knowledge, this is the first algorithm that achieves vanishing consensus error in the presence of quantization noise. Moreover, we provide simulation results that show tight agreement between our derived theoretical convergence rate and the numerical results.
This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability remains limited when the problem dimension is large and the projection onto a convex set is computationally costly. Instead, stochastic conditional gradient algorithms are proposed as an alternative solution which rely on (i) Approximating gradients via a simple averaging technique requiring a single stochastic gradient evaluation per iteration; (ii) Solving a linear program to compute the descent/ascent direction. The gradient averaging technique reduces the noise of gradient approximations as time progresses, and replacing projection step in proximal methods by a linear program lowers the computational complexity of each iteration. We show that under convexity and smoothness assumptions, our proposed stochastic conditional gradient method converges to the optimal objective function value at a sublinear rate of O(1/t 1/3 ). Further, for a monotone and continuous DR-submodular function and subject to a general convex body constraint, we prove that our proposed method achieves a ((1 − 1/e)OPT − ) guarantee (in expectation) with O(1/ 3 ) stochastic gradient computations. This guarantee matches the known hardness results and closes the gap between deterministic and stochastic continuous submodular maximization. Additionally, we achieve ((1/e)OPT − ) guarantee after operating on O(1/ 3 ) stochastic gradients for the case that the objective function is continuous DR-submodular but non-monotone and the constraint set is a down-closed convex body. By using stochastic continuous optimization as an interface, we also provide the first (1 − 1/e) tight approximation guarantee for maximizing a monotone but stochastic submodular set function subject to a general matroid constraint and (1/e) approximation guarantee for the non-monotone case. Numerical experiments for both convex and submodular settings are provided, and they illustrate fast convergence time for our proposed stochastic conditional gradient method relative to alternatives.
Consider a binary linear code of length N , minimum distance d min , transmission over the binary erasure channel with parameter 0 < ǫ < 1 or the binary symmetric channel with parameter 0 < ǫ < 1 2 , and block-MAP decoding. It was shown by Tillich and Zemor that in this case the error probability of the block-MAP decoder transitions "quickly" from δ to 1 − δ for any δ > 0 if the minimum distance is large. In particular the width of the transition is of order O(1/ √ d min ). We strengthen this result by showing that under suitable conditions on the weight distribution of the code, the transition width can be as small as Θ(1/N 1 2 −κ ), for any κ > 0, even if the minimum distance of the code is not linear. This condition applies e.g., to Reed-Mueller codes. Since Θ(1/N 1 2 ) is the smallest transition possible for any code, we speak of "almost" optimal scaling. We emphasize that the width of the transition says nothing about the location of the transition. Therefore this result has no bearing on whether a code is capacity-achieving or not. As a second contribution, we present a new estimate on the derivative of the EXIT function, the proof of which is based on the Blowing-Up Lemma.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.