We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named CodedTeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of CodedTeraSort is to impose structured redundancy in data, in order to enable innetwork coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97× -3.39× speedup, compared with TeraSort, for typical settings of interest.
Abstract-We consider a one-hop wireless system with a small number of delay constrained users and a larger number of users without delay constraints. We develop a scheduling algorithm that reacts to time varying channels and maximizes throughput utility (to within a desired proximity), stabilizes all queues, and satisfies the delay constraints. The problem is solved by reducing the constrained optimization to a set of weighted stochastic shortest path problems, which act as natural generalizations of max-weight policies to Markov decision networks. We also present approximation results for the corresponding shortest path problems, and discuss the additional complexity and delay incurred as compared to systems without delay constraints. The solution technique is general and applies to other constrained stochastic decision problems.
Abstract-An information collection problem in a wireless network with random events is considered. Wireless devices report on each event using one of multiple reporting formats. Each format has a different quality and uses different data lengths. Delivering all data in the highest quality format can overload system resources. The goal is to make intelligent format selection and routing decisions to maximize time-averaged information quality subject to network stability. Lyapunov optimization theory can be used to solve such a problem by repeatedly minimizing the linear terms of a quadratic drift-plus-penalty expression. To reduce delays, this paper proposes a novel extension of this technique that preserves the quadratic nature of the drift minimization while maintaining a fully separable structure. In addition, to avoid high queuing delay, paths are restricted to at most two hops. The resulting algorithm can push average information quality arbitrarily close to optimum, with a trade-off in queue backlog. The algorithm compares favorably to the basic driftplus-penalty scheme in terms of backlog and delay. Furthermore, the technique is generalized to solve linear programs and yields smoother results than the standard drift-plus-penalty scheme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.