Abstract. In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and overhead. Such data analysis can be facilitated by the recent advancements in scientific workflow systems. A major profit when using scientific workflow systems is the ability to make provenance collection a part of the workflow. Specifically, provenance should include not only the standard data lineage information but also information about the context in which the workflow was used, execution that processed the data, and the evolution of the workflow design. In this paper we describe a complete framework for data and process provenance in the Kepler Scientific Workflow System. We outline the requirements and issues related to data and workflow provenance in a multidisciplinary workflow system and introduce how generic provenance capture can be facilitated in Kepler's actor-oriented workflow environment. We also describe the usage of the stored provenance information for efficient rerun of scientific workflows.
There is an immediate and critical need for a rapid, broad-based genotyping method that can evaluate multiple mutations simultaneously in clinical cancer specimens and identify patients most likely to benefit from targeted agents now in use or in late-stage clinical development. We have implemented a prospective genotyping approach to characterize the frequency and spectrum of mutations amenable to drug targeting present in urothelial, colorectal, endometrioid, and thyroid carcinomas and in melanoma. Cancer patients were enrolled in a Personalized Cancer Medicine Registry that houses both clinical information and genotyping data, and mutation screening was performed using a multiplexed assay panel with mass spectrometry-based analysis to detect 390 mutations across 30 cancer genes. Formalin fixed, paraffin-embedded specimens were evaluated from 820 Registry patients. The genes most frequently mutated across multiple cancer types were BRAF, PIK3CA, KRAS, and NRAS. Less common mutations were also observed in AKT1, CTNNB1, FGFR2, FGFR3, GNAQ, HRAS, and MAP2K1. Notably, 48 of 77 PIK3CA-mutant cases (62%) harbored at least one additional mutation in another gene, most often KRAS. Among melanomas, only 54 of 73 BRAF mutations (74%) were the V600E substitution. These findings demonstrate the diversity and complexity of mutations in druggable targets among the different cancer types and underscore the need for a broadspectrum, prospective genotyping approach to personalized cancer medicine. The identification of somatic mutations that cause aberrant activation of intracellular signaling pathways has transformed the diagnosis and treatment of cancer. Mutations in specific genes define distinct subtypes of cancer, and provide invaluable markers for disease diagnosis and prognosis. Many of the mutated proteins also represent targets for novel therapeutic agents that are more specific, more efficacious, and less toxic than broad-based chemotherapeutic regimens.1-5 Indeed, matching the right drug to the right cancer genotype is a proven model for improving treatment and outcome in patients with chronic myelogenous leukemia (CML), nonsmall-cell lung carcinoma (NSCLC), gastrointestinal stromal tumor (GIST), colorectal carcinoma, and, most recently, malignant melanoma. 2,5-16The major successes from this therapeutic approach have been in diseases in which there is limited molecular heterogeneity, with all or most cases having a drug-sensitive mutation. Prime examples include BCR-ABL in CML 9 and KIT in GIST. 3,7 There is increasing evidence that many common cancers similarly harbor potentially druggable targets, albeit at relatively lower frequencies. For example, subsets of NSCLC have oncogenic mutations in EGFR, KRAS, BRAF, PIK3CA, or HER2 or a translocation involving the ALK gene. [17][18][19] In the more molecularly heterogeneous cancers, mutations in proteins other than the intended therapeutic target can profoundly affect response to therapy. Thus, in the case of EGFR inhibitor therapy in lung and colon cancer, KRAS and B...
Although an increasing amount of middleware has emerged in the last few years to achieve remote data access, distributed job execution, and data management, orchestrating these technologies with minimal overhead still remains a difficult task for scientists. Scientific workflow systems improve this situation by creating interfaces to a variety of technologies and automating the execution and monitoring of the workflows. Workflow systems provide domain-independent customizable interfaces and tools that combine different tools and technologies along with efficient methods for using them. As simulations and experiments move into the petascale regime, the orchestration of long running data and compute intensive tasks is becoming a major requirement for the successful steering and completion of scientific investigations. A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Kepler is a cross-project collaboration, co-founded by the SciDAC Scientific Data Management (SDM) Center, whose purpose is to develop a domain-independent scientific workflow system. It provides a workflow environment in which scientists design and execute scientific workflows by specifying the desired sequence of computational actions and the appropriate data flow, including required data transformations, between these steps. Currently deployed workflows range from local analytical pipelines to distributed, high-performance and high-throughput applications, which can be both data-and compute-intensive. The scientific workflow approach offers a number of advantages over traditional scripting-based approaches, including ease of configuration, improved reusability and maintenance of workflows and components (called actors), automated provenance management, "smart" re-running of different versions of workflow instances, on-the-fly updateable parameters, monitoring of long running tasks, and support for fault-tolerance and recovery from failures. We present an overview of common scientific workflow requirements and their associated features which are lacking in current state-of-the-art workflow management systems. We then illustrate features of the Kepler workflow system, both from a user's and a "workflow engineer's" point-of-view. In particular, we highlight the use of some of the current features of Kepler in several scientific applications, as well as upcoming extensions and improvements that are geared specifically for SciDAC user communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.