The problem of distributed representation learning is one in which multiple sources of information X1, . . . , XK are processed separately so as to learn as much information as possible about some ground truth Y . We investigate this problem from information-theoretic grounds, through a generalization of Tishby's centralized Information Bottleneck (IB) method to the distributed setting. Specifically, K encoders, K ≥ 2, compress their observations X1, . . . , XK separately in a manner such that, collectively, the produced representations preserve as much information as possible about Y . We study both discrete memoryless (DM) and memoryless vector Gaussian data models. For the discrete model, we establish a single-letter characterization of the optimal tradeoff between complexity (or rate) and relevance (or information) for a class of memoryless sources (the observations X1, . . . , XK being conditionally independent given Y ). For the vector Gaussian model, we provide an explicit characterization of the optimal complexity-relevance tradeoff. Furthermore, we develop a variational bound on the complexity-relevance tradeoff which generalizes the evidence lower bound (ELBO) to the distributed setting. We also provide two algorithms that allow to compute this bound: i) a Blahut-Arimoto type iterative algorithm which enables to compute optimal complexity-relevance encoding mappings by iterating over a set of self-consistent equations, and ii) a variational inference type algorithm in which the encoding mappings are parametrized by neural networks and the bound approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on synthetic and real datasets are provided to support the efficiency of the approaches and algorithms developed in this paper.
We study the transmission over a network in which users send information to a remote destination through relay nodes that are connected to the destination via finite-capacity error-free links, i.e., a cloud radio access network. The relays are constrained to operate without knowledge of the users' codebooks, i.e., they perform oblivious processing. The destination, or central processor, however, is informed about the users' codebooks. We establish a single-letter characterization of the capacity region of this model for a class of discrete memoryless channels in which the outputs at the relay nodes are independent given the users' inputs. We show that both relayingà-la Cover-El Gamal, i.e., compress-and-forward with joint decompression and decoding, and "noisy network coding", are optimal. The proof of the converse part establishes, and utilizes, connections with the Chief Executive Officer (CEO) source coding problem under logarithmic loss distortion measure. Extensions to general discrete memoryless channels are also investigated. In this case, we establish inner and outer bounds on the capacity region.For memoryless Gaussian channels within the studied class of channels, we characterize the capacity region when the users are constrained to time-share among Gaussian codebooks. Furthermore, we also discuss the suboptimality of separate decompression-decoding and the role of time-sharing.The results in this paper have been partially presented in [1]. Inaki Estella Aguerri is with the Mathematical and Algorithmic
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.