This paper aims to propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. The primary goal is to learn the few causal features of a high-dimensional dataset based on noisy observations from an unknown sparse linear model. However, the presumed training set which includes 𝑛 data samples in R 𝑝 is already distributed over a large network with 𝑁 clients connected through extremely low-bandwidth links. Also, we consider the asymptotic configuration of 1 𝑁 𝑛 𝑝. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network. In this regard, we theoretically show that the true causal features can be reliably recovered with negligible bandwidth usage of 𝑂 (𝑁 log 𝑝) across the network. This yields a significantly lower communication cost in comparison with the trivial case of transmitting all the samples to a single node (centralized scenario), which requires 𝑂 (𝑛𝑝) transmissions. Even more sophisticated schemes such as ADMM still have a communication complexity of 𝑂 (𝑁 𝑝). Surprisingly, our sample complexity bound is proved to be the same (up to a constant factor) as the optimal centralized approach for a fixed performance measure in each node, while that of a naïve decentralized technique grows linearly with 𝑁. Theoretical guarantees in this paper are based on the recent analytic framework of debiased LASSO in [1], and are supported by several computer experiments performed on both synthetic and real-world datasets.