In recent years Google's MapReduce has emerged as a leading large-scale data processing architecture. Adopted by companies such as Amazon, Facebook, Google, IBM and Yahoo! in daily use, and more recently put in use by several universities, it allows parallel processing of huge volumes of data over cluster of machines. Hadoop is a free Java implementation of MapReduce. In Hadoop, files are split into blocks and replicated and spread over all servers in a network. Each job is also split into many small pieces called tasks. Several tasks are processed on a single server, and a job is not completed until all the assigned tasks are finished. A crucial factor that affects the completion time of a job is the particular assignment of tasks to servers. Given a placement of the input data over servers, one wishes to find the assignment that minimizes the total completion time. In this paper, an idealized Hadoop model is proposed to investigate the Hadoop task assignment problem. It is shown that there is no feasible algorithm to find the optimal Hadoop task assignment unless P = N P. Assignments that are computed by the round robin algorithm inspired by the current Hadoop scheduler are shown to deviate from optimum by a multiplicative factor in the worst case. A flow-based algorithm is presented that computes assignments that are optimal to within an additive constant.
One of the major challenges for the deployment of underwater acoustic sensor networks is the development of a medium access control (MAC) protocol catering for the harsh underwater environment. In particular, an underwater MAC protocol should provide high end-to-end throughput, low channel access delay, and fair share of the scarce network bandwidth. In this paper, a cross-layer MAC protocol is proposed. It interacts with a price-based rate allocation scheme at the network layer. To accurately reflect the clique constraint of the wireless medium, the clique-based price is generalized to act as the congestion signal, which controls the end-to-end rates of multi-hop flows. The MAC protocol then schedules contention-free packet transmissions of single-hop subflows in each maximum clique. Both the MAC protocol and rate allocation algorithm are simple and direct, thus owning low computational complexity. Through analysis and simulation, we show that the proposed MAC protocol enables multi-hop flows to acquire the max-min fair share of the network bandwidth from the end-to-end perspective.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.