SUMMARYInitially, Grid technologies were principally associated with supercomputer centres and large-scale scientific applications in physics and astronomy. They are now increasingly seen as being relevant to many areas of e-Science and e-Business. The emergence of the Open Grid Services Architecture (OGSA), to complement the ongoing activity on Web Services standards, promises to provide a service-based platform that can meet the needs of both business and scientific applications. Early Grid applications focused principally on the storage, replication and movement of file-based data. Now the need for the full integration of database technologies with Grid middleware is widely recognized. Not only do many Grid applications already use databases for managing metadata, but increasingly many are associated with large databases of domain-specific information (e.g. biological or astronomical data). This paper describes the design and implementation of OGSA-DAI, a service-based architecture for database access over the Grid. The approach involves the design of Grid Data Services that allow consumers to discover the properties of structured data stores and to access their contents. The initial focus has been on support for access to Relational and XML data, but the overall architecture has been designed to be extensible to accommodate different storage paradigms. The paper describes and motivates the design decisions that have been taken, and illustrates how the approach supports a range of application scenarios. The OGSA-DAI software is freely available from http://www.ogsadai.org.uk.
No abstract
We present dispel4py a versatile data-intensive kit presented as a standard Python library. It empowers scientists to experiment and test ideas using their familiar rapid-prototyping environment. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The mappings are fully automated, so that the encoded data analyses and data handling are completely unchanged. The underpinning model is lightweight composition of fine-grained operations on data, coupled together by data streams that use the lowest cost technology available. These fine-grained workflows are locally interpreted during development and mapped to multiple nodes and systems such as MPI and Storm for production.We explain why such an approach is becoming more essential in order that data-driven research can innovate rapidly and exploit the growing wealth of data while adapting to current technical trends. We show how provenance management is provided to improve understanding and reproducibility, and how a registry supports consistency and sharing. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved. Finally we present the next steps to achieve scalability and performance.
This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, MPI, multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and HPC architectures and consistent scalability.
OGSA-DAI (Open Grid Services Architecture Data Access and Integration) is a framework for building distributed data access and integration systems. Until recently, it lacked the built-in functionality that would allow easy creation of federations of distributed data sources. The latest release of the OGSA-DAI framework introduced the OGSA-DAI DQP (Distributed Query Processing) resource. The new resource encapsulates a distributed query processor, that is able to orchestrate distributed data sources when answering declarative user queries. The query processor has many extensibility points, making it easy to customize. We have also introduced a new OGSA-DAI VIEWS resource that provides a flexible method for defining views over relational data. The interoperability of the two new resources, together with the flexibility of the OGSA-DAI framework, allows the building of highly customized data integration solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.