In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place.Existing programming environments are either portable (Java), or they are flexible (Jini, Java RMI), or they are highly efficient (MPI). No system combines all three properties that are necessary for grid computing. In this paper, we present Ibis, a new programming environment that combines Java's "run everywhere" portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to 9 times higher throughputs with trees of objects.
SUMMARYIn this paper, we discuss a real-world application scenario that uses three distinct types of workflow within the Triana problem-solving environment: serial scientific workflow for the data processing of gravitational wave signals; job submission workflows that execute Triana services on a testbed; and monitoring workflows that examine and modify the behaviour of the executing application. We briefly describe the Triana distribution mechanisms and the underlying architectures that we can support. Our middleware independent abstraction layer, called the Grid Application Prototype (GAP), enables us to advertise, discover and communicate with Web and peer-to-peer (P2P) services. We show how gravitational wave search algorithms have been implemented to distribute both the search computation and data across the European GridLab testbed, using a combination of Web services, Globus interaction and P2P infrastructures.
Abstract. In previous work we have shown that the MapReduce framework for distributed computation can be deployed for highly scalable inference over RDF graphs under the RDF Schema semantics. Unfortunately, several key optimizations that enabled the scalable RDFS inference do not generalize to the richer OWL semantics. In this paper we analyze these problems, and we propose solutions to overcome them. Our solutions allow distributed computation of the closure of an RDF graph under the OWL Horst semantics.We demonstrate the WebPIE inference engine, built on top of the Hadoop platform and deployed on a compute cluster of 64 machines. We have evaluated our approach using some real-world datasets (UniProt and LDSR, about 0.9-1.5 billion triples) and a synthetic benchmark (LUBM, up to 100 billion triples). Results show that our implementation is scalable and vastly outperforms current systems when comparing supported language expressivity, maximum data size and inference speed.
The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. SCALABLE RDF DATA COMPRESSION WITH MAPREDUCE 25 make dictionary encoding a feasible technique on a very large input, a distributed implementation is required. To the best of our knowledge, no distributed approach exists to solve this problem.In this paper, we propose a technique to compress and decompress RDF statements using the MapReduce programming model [6]. Our approach uses a dictionary encoding technique that maintains the original structure of the data. This technique can be used by all RDF applications that need to efficiently process a large amount of data, such as RDF storage engines, network analysis tools, and reasoners.Our compression technique was essential in our recent work on Semantic Web inference engines, as it allowed us to reason directly on the compressed statements with a consequent increase of performance. As a result, we were able to reason over tens of billions of statements [7,8], advancing the current state of the art in the field significantly.The compression technique we present in this paper has the following: (i) performance that scales linearly; (ii) the ability to build a very large dictionary of hundreds of millions of entries; and (iii) the ability to handle load balancing issues with sampling and caching.This paper is structured as follows. In Section 2, we discuss the conventional approach to dictionary encoding and highlight the problems that arise. Sections 3 and 4 describe how we have implemented the data compression and decompression in MapReduce. Section 5 evaluates our approach, and Section 6 describes related work. Finally, we conclude and discuss future work in Section 7. DICTIONARY ENCODINGDictionary encoding is often used because of its simplicity. In our case, dictionary encoding has also the additional advantage that the compressed data can still be manipulated by the application. Traditional techniques such as gzip or bzip2 hide the original data so that reading without decompression is impossible. Algorithm 1 shows a sequential algorithm to compress and decompress RDF statements. The compression algorithm starts by initializing the dictionary table. The table has two columns, one tha...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.