Tak-Lon Wu scite author profile

Tak-Lon Wu

4Publications

159Citation Statements Received

42Citation Statements Given

How they've been cited

280

159

How they cite others

Affiliations

Amazon (United States), Indiana University Bloomington, NuVasive (United States)

Publications

Order By: Most citations

MapReduce in the Clouds for Science

Gunarathne

Qiu

et al. 2010

120

View full text Add to dashboard Cite

Abstract-The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. MapReduce distributed data processing architecture has become the weapon of choice for data-intensive analyses in the clouds and in commodity clusters due to its excellent fault tolerance features, scalability and the ease of use. Currently, there are several options for using MapReduce in cloud environments, such as using MapReduce as a service, setting up one's own MapReduce cluster on cloud instances, or using specialized cloud MapReduce runtimes that take advantage of cloud infrastructure services. In this paper, we introduce AzureMapReduce, a novel MapReduce runtime built using the Microsoft Azure cloud infrastructure services. AzureMapReduce architecture successfully leverages the high latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on demand alternative to traditional MapReduce clusters. Further we evaluate the use and performance of MapReduce frameworks, including AzureMapReduce, in cloud environments for scientific applications using sequence assembly and sequence alignment as use cases.

show abstract

Cloud computing paradigms for pleasingly parallel biomedical applications

Gunarathne

Choi

et al. 2011

Concurrency and Computation

View full text Add to dashboard Cite

Cloud computing offers new approaches for scientific computing that leverage the major commercial hardware and software investment in this area. Closely coupled applications are still unclear in clouds as synchronization costs are still higher than on optimized MPI machines. However loosely coupled problems are very important in many fields and can achieve good cloud performance even when pleasingly parallel steps are followed by reduction operations as supported by MapReduce. However we can use clouds in several ways and here we compare four different approaches using two biomedical applications. We look at the cloud infrastructure service based virtual machine utility computing models of Amazon AWS and Microsoft Windows Azure; Map Reduce based computing frameworks Apache Hadoop (deployed on raw hardware as well as on virtual machines) and Micrsoft DryadLINQ. We compare performance showing strong variations in cost between different EC2 machine choices and comparable performance between the utility computing (spawn off a set of jobs) and managed parallelism (MapReduce). The MapReduce approach offered the most user friendly approach.

show abstract

Hybrid cloud and cluster computing paradigms for life science applications

et al. 2010

View full text Add to dashboard Cite

BackgroundClouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.ResultsComparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.ConclusionsThe hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.MethodsWe used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

show abstract

Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Gunarathne

Zhang

et al. 2013

Future Generation Computer Systems

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tak-Lon Wu

MapReduce in the Clouds for Science

Cloud computing paradigms for pleasingly parallel biomedical applications

Hybrid cloud and cluster computing paradigms for life science applications

Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Contact Info

Product

Resources

About