SUMMARYLife sciences research is based on individuals, often with diverse skills, assembled into research groups. These groups use their specialist expertise to address scientific problems. The in silico experiments undertaken by these research groups can be represented as workflows involving the co-ordinated use of analysis programs and information repositories that may be globally distributed. With regards to Grid computing, the requirements relate to the sharing of analysis and information resources rather than sharing computational power. The my Grid project has developed the Taverna workbench for the composition and execution of workflows for the life sciences community. This experience paper describes lessons learnt during the development of Taverna. A common theme is the importance of understanding how workflows fit into the scientists' experimental context. The lessons reflect an evolving understanding of life scientists' requirements on a workflow environment, which is relevant to other areas of data intensive and exploratory science.
Service-based approaches (such as Web Services and the Open Grid Services Architecture) have gained considerable attention recently for supporting distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises the question as to how database management systems and technologies can best be deployed or adapted for use in such an environment. This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based Grid. The paper describes in detail the design and implementation of a service-based distributed query processor for the Grid. The query processor is service-based in two orthogonal senses: firstly, it supports querying over data storage and analysis resources that are made available as services, and, secondly, its internal architecture factors out as services the functionalities related to the construction of distributed query plans on the one hand, and to their execution over the Grid on the other. The resulting system both provides a declarative approach to service orchestration in the Grid, and demonstrates how query processing can benefit from dynamic access to computational resources on the Grid.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.