OGSA-DAI (Open Grid Services Architecture Data Access and Integration) is a framework for building distributed data access and integration systems. Until recently, it lacked the built-in functionality that would allow easy creation of federations of distributed data sources. The latest release of the OGSA-DAI framework introduced the OGSA-DAI DQP (Distributed Query Processing) resource. The new resource encapsulates a distributed query processor, that is able to orchestrate distributed data sources when answering declarative user queries. The query processor has many extensibility points, making it easy to customize. We have also introduced a new OGSA-DAI VIEWS resource that provides a flexible method for defining views over relational data. The interoperability of the two new resources, together with the flexibility of the OGSA-DAI framework, allows the building of highly customized data integration solutions.
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).
In modern distributed computing, vast amounts of data are stored in many different formats, employing different storage solutions. Much of modern research involves analyzing data arising from experiments and simulations held in existing repositories often owned by different organizations and partners. With on-line storage and provisioning, sharing data has become a key part of modern research in science and humanities fields. As data can be distributed across multiple geographical locations, use different formats and access mechanisms, a solution to facilitate data sharing to enable researchers and developers whilst also satisfying the requirements of the data providers would be advantageous.OGSA-DAI 3.0 [1] is such a middleware software solution. It provides application developers with the means to access data distributed across multiple platforms with different native access mechanisms. Data integration can take place at the server and deliver results using a variety of protocols and mechanisms within OGSA-DAI. It accomplishes this by using a highly flexible and extensible framework which can accommodate different types of data resources, such as XML databases, relational databases or files, different operations such as transformation to different formats, selection or filter operations. The framework can be extended by a developer to provide customized functionality for project specific tasks while using generic functions for common tasks such as database querying.OGSA-DAI 3.0 executes data-centric workflows composed of activities (a discreet operational task) which can target different resources. Workflows can be simple, such as querying a database and delivering results to an FTP server or they can be more complex, such as querying relational and XML databases within in a single workflow, copying information from these from one source to another whilst e-mailing the information copied to a change archive account based on different criteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.