In a network of agents, a widespread problem is the need to estimate a common underlying function starting from locally distributed measurements. Real-world scenarios may not allow the presence of centralized fusion centers, requiring the development of distributed, message-passing implementations of the standard machine learning training algorithms. In this paper, we are concerned with the distributed training of a particular class of recurrent neural networks, namely echo state networks (ESNs). In the centralized case, ESNs have received considerable attention, due to the fact that they can be trained with standard linear regression routines. Based on this observation, in our previous work we have introduced a decentralized algorithm, framed in the distributed optimization field, in order to train an ESN. In this paper, we focus on an additional sparsity property of the output layer of ESNs, allowing for very efficient implementations of the resulting networks. In order to evaluate the proposed algorithm, we test it on two well-known prediction benchmarks, namely the Mackey-Glass chaotic time series and the 10th order nonlinear auto regressive moving average (NARMA) system.