SUMMARYLife sciences research is based on individuals, often with diverse skills, assembled into research groups. These groups use their specialist expertise to address scientific problems. The in silico experiments undertaken by these research groups can be represented as workflows involving the co-ordinated use of analysis programs and information repositories that may be globally distributed. With regards to Grid computing, the requirements relate to the sharing of analysis and information resources rather than sharing computational power. The my Grid project has developed the Taverna workbench for the composition and execution of workflows for the life sciences community. This experience paper describes lessons learnt during the development of Taverna. A common theme is the importance of understanding how workflows fit into the scientists' experimental context. The lessons reflect an evolving understanding of life scientists' requirements on a workflow environment, which is relevant to other areas of data intensive and exploratory science.
Abstract-Much has been written on the promise of Web service discovery and (semi-) automated composition. In this discussion, the value to practitioners of discovering and reusing existing service compositions, captured in workflows, is mostly ignored. This paper presents one solution to workflow discovery. Through a survey with 21 scientists and developers from the my Grid workflow environment, workflow discovery requirements are elicited. Through a user experiment with 13 scientists, an attempt is made to build a gold standard for workflow ranking. Through the design and implementation of a workflow discovery tool, a mechanism for ranking workflow fragments is provided based on graph sub-isomorphism matching. The tool evaluation, drawing on a corpus of 89 public workflows from bioinformatics and the results of the user experiment, finds that the average human ranking can largely be reproduced.
This collection of articles on 'Workflows for e-Science' is very timely and important. Increasingly, to attack the next generation of scientific problems, multidisciplinary and distributed teams of scientists need to collaborate to make progress on these new 'Grand Challenges'. Scientists now need to access and exploit computational resources and databases that are geographically distributed through the use of high speed networks. 'Virtual Organizations' or 'VOs' must be established that span multiple administrative domains and/or institutions and which can provide appropriate authentication and authorization services and access controls to collaborating members. Some of these VOs may only have a fleeting existence but the lifetime of others may run into many years. The Grid community is attempting to develop both standards and middleware to enable both scientists and industry to build such VOs routinely and robustly. This, of course, has been the goal of research in distributed computing for many years; but now these technologies come with a new twist service orientation. By specifying resources in terms of a service description, rather than allowing direct access to the resources, the IT industry believes that such an approach results in the construction of more robust distributed systems. The industry has therefore united around web services as the standard technology to implement such service oriented architectures and to ensure interoperability between different vendor systems. The Grid community is also now uniting in developing 'Web Service Grids' based on an underlying web service infrastructure. In addition to the security services of VOs, scientists require services that allow them to run jobs on remote computers and to access and query databases remotely. As these data analysis operations become more and more complex and repetitive, there is a need to capture and coordinate the orchestrated operations that access the resources of a VO or Grid. Scientific workflows have therefore emerged and been adapted from the business world as a means to formalize and structure the data analysis and computations on the distributed resources. Such scientific workflows in fact
To date on-line processes (i.e. workflows) built in e-Science have been the result of collaborative team efforts. As more of these workflows are built, scientists start sharing and reusing stand-alone compositions of services, or workflow fragments. They repurpose an existing workflow or workflow fragment by finding one that is close enough to be the basis of a new workflow for a different purpose, and making small changes to it. Such a "workflow by example" approach complements the popular view in the Semantic Web Services literature that on-line processes are constructed automatically from scratch, and could help bootstrap the Web of Science. Based on a comparison of e-Science middleware projects, this paper identifies seven bottlenecks to scalable reuse and repurposing. We include some thoughts on the applicability of using OWL for two bottlenecks: workflow fragment discovery and the ranking of fragments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.