Cloud computing is becoming mainstream for High Performance Computing (HPC) application development over the last few years. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based, without any performance guarantees. Cloud computing effectively saves the eScience developer the hassles of resource provisioning but utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Furthermore, in order to make application design choices for a particular cloud offering, an eScience developer needs to understand the performance capabilities of the underlying cloud platform. Among all clouds, the emerging Azure cloud from Microsoft remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as MPI and map-reduce and due to its evolving APIs. To aid the HPC developers, we present an open-source benchmark suite, AzureBench 1 , for Windows Azure cloud platform. We report comprehensive performance analysis of Azure cloud platform's storage services which are its primary artifacts for inter-processor coordination and communication. We also report on how much scalability Azure platform affords using up to 100 processors and point out various bottlenecks in parallel access of storage services. The paper also has pointers to overcome the steep learning curve for HPC application development over Azure. We also provide an open-source generic application framework that can be a starting point for application development for bag-of-task applications over Azure.
Polygon overlay is one of the complex operations in computational geometry. It is applied in many fields such as Geographic Information Systems (GIS), computer graphics and VLSI CAD. Sequential algorithms for this problem are in abundance in literature but there is a lack of distributed algorithms especially for MapReduce platform. In GIS, spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. The MapReduce paradigm is now standard in industry and academia for processing large-scale data. Motivated by the MapReduce programming model, we revisit the distributed polygon overlay problem and its implementation on MapReduce platform. Our algorithms are geared towards maximizing local processing and minimizing the communication overhead inherent with shuffle and sort phases in MapReduce. We have experimented with two data sets and achieved up to 22x speedup with dataset 1 using 64 CPU cores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.