Cascaded Coded Distributed Computing on Heterogeneous Networks

Woolsey, Nicholas; Chen, Rongrong; Ji, Mingyue

doi:10.1109/isit.2019.8849845

Cited by 31 publications

(29 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A unique contribution of this work is the successful validation of the FLCD through empirical evaluations on AMAZON EC2. This provides strong evidence on the effectiveness of the combinatorial designs utilized in not only this work, but also those in [24], [25].…”

Section: Introductionmentioning

confidence: 64%

“…This is also the first time that theoretical predictions of the shuffle time of a CDC design are validated by empirical evaluations. While the proposed FLCD schemes in this work originate from previously developed combinatorial designs for CDC networks [24], [25], a key difference is that FLCD leverages the design freedom in defining map and reduce functions to support varying IV sizes in a more general MapReduce framework. Compared to [24], [25] which focus on heterogeneous systems, this new approach puts a different emphasis on asymptotic homogeneous systems and aims to design more flexible CDC schemes that can operate under a wider range of system parameters.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A New Combinatorial Design of Coded Distributed Computing

Woolsey

Chen

2018

2018 IEEE International Symposium on Information Theory (ISIT)

Self Cite

View full text Add to dashboard Cite

Coded distributed computing introduced by Li et al. in 2015 is an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce. In particular, Li et al. show that increasing the computation load in the Map phase by a factor of r can create coded multicasting opportunities to reduce the communication load in the Reduce phase by the same factor. However, there are two major limitations in practice. First, it requires an exponentially large number of input files (data batches) when the number of computing nodes gets large. Second, it forces every s computing nodes to compute one Map function, which leads to a large number of Map functions required to achieve the promised gain. In this paper, we make an attempt to overcome these two limitations by proposing a novel coded distributed computing approach based on a combinatorial design. We demonstrate that when the number of computing nodes becomes large, 1) the proposed approach requires an exponentially less number of input files; 2) the required number of Map functions is also reduced exponentially. Meanwhile, the resulting computation-communication trade-off maintains the multiplicative gain compared to conventional uncoded unicast and achieves the information theoretic lower bound asymmetrically for some system parameters.

show abstract

Section: Introductionmentioning

confidence: 64%

Section: Introductionmentioning

confidence: 99%

A New Combinatorial Design of Coded Distributed Computing

Woolsey

Chen

2018

2018 IEEE International Symposium on Information Theory (ISIT)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our computation-aware function assignment, the least number of output functions assigned to nodes is given by . Table II lists the least numbers required for input files and output functions in [3], [10], [13], [14] and our scheme in the MapReduce system considered in Section IV-C. It can be seen that the numbers required for output functions in our function assignment strategies are relatively close to existing works, and our computation-aware function assignment requires less number of output functions than those in [13], [14], but the number required for input files in our scheme is much larger than existing works.…”

Section: Discussion On the Required Numbers Of Input Files And Outmentioning

confidence: 99%

“…When m > 0.75 and K = 12, our shuffle-aware function assignment achieves smaller communication load than L * Hom , because: 1) coded multicasting opportunities are sufficiently exploited by this function assignment; 2) nodes with higher computation load are assigned more output functions and less communication is needed to satisfy the requests of these nodes. Table I shows the achievable communication loads of [13], [14] and our results with four function assignments for certain m in the MapReduce systems with K = 12. The communication load in [13] is the largest because each output function is computed by multiple nodes.…”

Section: Corollary 1 For a Heterogeneous Mapreduce Computing System mentioning

confidence: 99%

Heterogeneous Coded Distributed Computing: Joint Design of File Allocation and Function Assignment

Tao

2019

2019 IEEE Global Communications Conference (GLOBECOM)

View full text Add to dashboard Cite

This paper studies the computation-communication tradeoff in a heterogeneous MapReduce computing system where each distributed node is equipped with different computation capability. We first obtain an achievable communication load for any given computation load and any given function assignment at each node. The proposed file allocation strategy has two steps: first, the input files are partitioned into disjoint batches, each with possibly different size and computed by a distinct node; then, each node computes additional files from its non-computed files according to its redundant computation capability. In the Shuffle phase, coded multicasting opportunities are exploited thanks to the repetitive file allocation among different nodes. Based on this scheme, we further propose the computation-aware and the shuffle-aware function assignments. We prove that, by using proper function assignments, our achievable communication load for any given computation load is within a constant multiplicative gap to the optimum in an equivalent homogeneous system with the same average computation load. Numerical results show that our scheme with shuffle-aware function assignment achieves better computationcommunication tradeoff than existing works in some cases.coded multicasting opportunities are created as many as possible in the Shuffle phase to obtain the optimal computation-communication tradeoff. However, they only consider heterogeneous file allocation in the Map phase due to different storage size, and still assume homogeneous function assignment in the Reduce phase without taking the different computation capabilities across nodes into account. The authors in [13], [14] consider the heterogeneous systems where each node is assigned different number of output functions. Both works obtain an achievable communication load which is within a constant multiplicative gap to the optimum given the considered function assignment. They find that, by assigning more output functions to nodes with more input files, their proposed schemes even outperform the optimal scheme in an equivalent homogeneous system [3] in some cases. However, the heterogeneous systems considered in [13],[14] consist of multiple homogeneous systems where nodes in each system have the same storage and computation capabilities but differ from nodes in other systems, and is thus not suitable to

show abstract

“…A converse bound was proposed in [14] to show that the proposed coded distributed computing scheme is optimal in terms of communication load. This coded distributed computing framework was extended to the cases such as computing only necessary intermediate values [15], [16], reducing file partitions and number of output functions [16], [17], and considering random network topologies [18], stragglers [19], storage cost [20], and heterogeneous computing power, function assignment and storage space [21], [22].…”

Section: Relation To Other Problemsmentioning

confidence: 99%

Fundamental Limits of Decentralized Data Shuffling

Wan

Tuninetti

et al. 2020

IEEE Trans. Inform. Theory

Self Cite

View full text Add to dashboard Cite

Data shuffling of training data among different computing nodes (workers) has been identified as a core element to improve the statistical performance of modern large scale machine learning algorithms.Data shuffling is often considered as one of the most significant bottlenecks in such systems due to the heavy communication load. Under a master-worker architecture (where a master has access to the entire dataset and only communication between the master and the workers is allowed) coding has been recently proved to considerably reduce the communication load. This work considers a different communication paradigm referred to as decentralized data shuffling, where workers are allowed to communicate with one another via a shared link. The decentralized data shuffling problem has two phases: workers communicate with each other during the data shuffling phase, and then workers update their stored content during the storage phase. For the case of uncoded storage (i.e., each worker directly A short version of this paper was presented scheme for which the master simply transmits the missing but required data to the workers by directly broadcasting the missing bits over the shared link.The centralized coded data shuffling scheme with coordinated (i.e., deterministic) uncoded storage update phase was originally proposed in [6], [7] to further reduce the communication load for the worst-case shuffles compared to [3]. The proposed schemes in [6], [7] are optimal under the constraint of uncoded storage for the cases where there is no extra memory for each worker (i.e., q = 1) or there are less than or equal to three workers in the systems. Inspired by the achievable and converse bounds for the single-bottleneck-link caching problem in [8]-[10], the authors in [11] then proposed a general coded data shuffling scheme, which was shown to be order optimality to within a factor of 2 under the constraint of uncoded storage. Also in [11], the authors improved the performance of the general coded shuffling scheme by introducing an aligned coded delivery, which was shown to be optimal under the constraint of uncoded storageRecently, inspired by the improved data shuffling scheme in [11], the authors in [12] proposed a linear coding scheme based on interference alignment, which achieves the optimal worstcase communication load under the constraint of uncoded storage for all system parameters. In addition, under the constraint of uncoded storage, the proposed coded data shuffling scheme in [12] was shown to be optimal for any shuffles (not just for the worst-case) when q = 1. B. Decentralized Data ShufflingAn important limitation of the centralized framework is the assumption that workers can only receive packets from the master. Since the entire data set is stored in a decentralized fashion across the workers at each epoch of the distributed learning algorithm, the master may not be needed in the data shuffling phase if workers can communicate with each other (e.g., [1]). In addition, the communication among workers can be much more ef...

show abstract

Cascaded Coded Distributed Computing on Heterogeneous Networks

Cited by 31 publications

References 13 publications

A New Combinatorial Design of Coded Distributed Computing

A New Combinatorial Design of Coded Distributed Computing

Heterogeneous Coded Distributed Computing: Joint Design of File Allocation and Function Assignment

Fundamental Limits of Decentralized Data Shuffling

Contact Info

Product

Resources

About