Data shuffling of training data among different computing nodes (workers) has been identified as a core element to improve the statistical performance of modern large scale machine learning algorithms.Data shuffling is often considered as one of the most significant bottlenecks in such systems due to the heavy communication load. Under a master-worker architecture (where a master has access to the entire dataset and only communication between the master and the workers is allowed) coding has been recently proved to considerably reduce the communication load. This work considers a different communication paradigm referred to as decentralized data shuffling, where workers are allowed to communicate with one another via a shared link. The decentralized data shuffling problem has two phases: workers communicate with each other during the data shuffling phase, and then workers update their stored content during the storage phase. For the case of uncoded storage (i.e., each worker directly A short version of this paper was presented scheme for which the master simply transmits the missing but required data to the workers by directly broadcasting the missing bits over the shared link.The centralized coded data shuffling scheme with coordinated (i.e., deterministic) uncoded storage update phase was originally proposed in [6], [7] to further reduce the communication load for the worst-case shuffles compared to [3]. The proposed schemes in [6], [7] are optimal under the constraint of uncoded storage for the cases where there is no extra memory for each worker (i.e., q = 1) or there are less than or equal to three workers in the systems. Inspired by the achievable and converse bounds for the single-bottleneck-link caching problem in [8]-[10], the authors in [11] then proposed a general coded data shuffling scheme, which was shown to be order optimality to within a factor of 2 under the constraint of uncoded storage. Also in [11], the authors improved the performance of the general coded shuffling scheme by introducing an aligned coded delivery, which was shown to be optimal under the constraint of uncoded storageRecently, inspired by the improved data shuffling scheme in [11], the authors in [12] proposed a linear coding scheme based on interference alignment, which achieves the optimal worstcase communication load under the constraint of uncoded storage for all system parameters. In addition, under the constraint of uncoded storage, the proposed coded data shuffling scheme in [12] was shown to be optimal for any shuffles (not just for the worst-case) when q = 1.
B. Decentralized Data ShufflingAn important limitation of the centralized framework is the assumption that workers can only receive packets from the master. Since the entire data set is stored in a decentralized fashion across the workers at each epoch of the distributed learning algorithm, the master may not be needed in the data shuffling phase if workers can communicate with each other (e.g., [1]). In addition, the communication among workers can be much more ef...