All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

Moretti, Christopher; Bui, Hoang; Hollingsworth, Karen; Rich, Brandon; Flynn, Patrick J.; Thain, Douglas

doi:10.1109/tpds.2009.49

Cited by 59 publications

(42 citation statements)

References 30 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…All-pairs is presented by Moretti et al [15] for data-intensive computing on campus grids. It provides an abstraction to users to deal with All-to-All Comparison Problems.…”

Section: Related Work and Motivationsmentioning

confidence: 99%

“…Among the existing frameworks, Hadoop [14] is popularly used to support the MapReduce computation pattern, but is inefficient in processing of All-to-All Comparison Problems. All-pairs [15] is designed for All-to-All Comparison Problems in a Campus Grid, but its application range is limited due to its brute-force data storage strategy, which stores all the data on all the worker nodes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A distributed computing framework for All-to-All comparison problems

Zhang

Tian

Kelly

et al. 2014

IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society

View full text Add to dashboard Cite

“…All-pairs is presented by Moretti et al [15] for data-intensive computing on campus grids. It provides an abstraction to users to deal with All-to-All Comparison Problems.…”

Section: Related Work and Motivationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A distributed computing framework for All-to-All comparison problems

Zhang

Tian

Kelly

et al. 2014

IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society

View full text Add to dashboard Cite

“…First, MapReduce only handles one-dimensional input and hence is not suitable for implementing both query segmentation and database segmentation approaches. Moretti et al has reported a similar observation that MapReduce is not sufficient to express all-to-all style computation [43]. The existing MapReduce BLAST implementation, i.e., CloudBLAST [44], only implements query segmentation and stores the entire database on each node.…”

Section: Mapreducementioning

confidence: 99%

Coordinating Computation and I/O in Massively Parallel Sequence Search

Lin

Feng

et al. 2011

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation-and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these run-time irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load-balancing of computation and highperformance noncontiguous I/O.

show abstract

“…Porting of ccc-gistemp to other scalable systems intended for data-intensive computing such as Dryad [12], All-Pairs [17] and Pregel [16] would provide a comparative study of the various programming abstractions that are suitable. Likewise implementations of MapReduce which use existing highperformance shared filesystems are now available (e.g.…”

Section: Further Workmentioning

confidence: 99%

Evaluating the suitability of mapreduce for surface temperature analysis codes

Sudhakaran

Hong

2011

Proceedings of the Second International Workshop on Data Intensive Computing in the Clouds

View full text Add to dashboard Cite

Processing large volumes of scientific data requires an efficient and scalable parallel computing framework to obtain meaningful information quickly. In this paper, we evaluate a scientific application from the environmental sciences for its suitability to use the MapReduce framework. We consider cccgistemp -a Python reimplementation of the original NASA GISS model for estimating global temperature change -which takes land and ocean temperature records from different sites, removes duplicate records, and adjusts for urbanisation effects before calculating the 12 month running mean global temperature. The application consists of several stages, each displaying differing characteristics, and three stages have been ported to use Hadoop with the mrjob library. We note performance bottlenecks encountered while porting and suggest possible solutions, including modification of data access patterns to overcome uneven distribution of input data.

show abstract

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

Cited by 59 publications

References 30 publications

A distributed computing framework for All-to-All comparison problems

A distributed computing framework for All-to-All comparison problems

Coordinating Computation and I/O in Massively Parallel Sequence Search

Evaluating the suitability of mapreduce for surface temperature analysis codes

Contact Info

Product

Resources

About