Presented in this paper is the data model for ORION, a prototype database system that adds persistence and sharability to objects created and manipulated in object-oriented applications. The ORION data model consolidates and modifies a number of major concepts found in many objectoriented systems, such as objects, classes, class lattice, methods, and inheritance. These concepts are reviewed and three major enhancements to the conventional object-oriented data model, namely, schema evolution, composite objects, and versions, are elaborated upon. Schema evolution is the ability to dynamically make changes to the class definitions and the structure of the class lattice. Composite objects are recursive collections of exclusive components that are treated as units of storage, retrieval, and integrity enforcement. Versions are variations of the same object that are related by the history of their derivation. These enhancements are strongly motivated by the data management requirements of the ORION applications from the domains of artificial intelligence, computer-aided design and manufacturing, and office information systems with multimedia documents.
MapReduce is a programming model which is extensively used for large-scale data analysis. The join operation is one of the essential operations for the data analysis. However, MapReduce is not very efficient to perform the join operation since it always processes all records in the datasets even in the cases that only small fraction of datasets are relevant for the join operation. We alleviate this problem by applying bloomjoin algorithm, a classic distributed join algorithm. We improve the join performance using Bloom filters in MapReduce. In our approach, the Bloom filters are constructed in distributed fashion and are used to filter out redundant intermediate records. In order to apply the Bloom filters in MapReduce, we modify Hadoop to assign the input datasets to map tasks sequentially, and we propose a method to determine the processing order of input datasets based on the estimated cost. Our experimental results show that the number of intermediate results is decreased and the join performance can be improved in our architecture.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.