Abstract. The significance of regular path queries (RPQs) on graphlike data structures has grown steadily over the past decade. RPQs are, often in restricted forms, part of graph-oriented query languages such as XQuery/XPath and SPARQL, and have applications in areas such as semantic, social, and biomedical networks. However, existing systems for evaluating RPQs are restricted either in the type of the graph (e.g., only trees), the type of regular expressions (e.g., only single steps), and/or the size of the graphs they can handle. No method has yet been developed that would be capable of efficiently evaluating general RPQs on large graphs, i.e., with millions of nodes/edges.We present a novel approach for answering RPQs on large graphs. Our method exploits the fact that not all labels in a graph are equally frequent. We devise an algorithm which decomposes an RPQ into a series of smaller RPQs using rare labels, i.e., elements of the query with few matches, as way-points. A search thereby is decomposed into a set of smaller search problems which are tackled in a bi-directional fashion, supported by a set of graph indexes. Comparison of our algorithm with two approaches following the traditional methods for tackling such problems, i.e., the usage of automata, reveals that (a) the automata-based methods are not able to handle large graphs due to the amount of memory they require, and that (b) our algorithm outperforms the automatabased approach, often by orders of magnitude. Another advantage of our algorithm is that it can be parallelized easily.
The microarray-based analysis of gene expression has become a workhorse for biomedical research. Managing the amount and diversity of data that such experiments produce is a task that must be supported by appropriate software tools, which led to the creation of literally hundreds of systems. In consequence, choosing the right tool for a given project is difficult even for the expert. We report on the results of a survey encompassing 78 of such tools, of which 22 were inspected in detail and seven were tested hands-on. We report on our experiences with a focus on completeness of functionality, ease-of-use, and necessary effort for installation and maintenance. Thereby, our survey provides a valuable guideline for any project considering the use of a microarray data management system. It reveals which tasks are covered by mature tools and also shows that important requirements, especially in the area of integrated analysis of different experimental data, are not yet met satisfyingly by existing systems.
Abstract. Current projects in Systems Biology often produce a multitude of different high-throughput data sets that need to be managed, processed, and analyzed in an integrated fashion. In this paper, we present the OmixAnalyzer, a web-based tool for management and analysis of heterogeneous omics data sets. It currently supports gene microarrays, miRNAs, and exon-arrays; support for mass spectrometry-based proteomics is on the way, and further types can easily be added due to its plug-andplay architecture. Distinct from competitor systems, the OmixAnalyzer supports management, analysis, and visualization of data sets; it features a mature system of access rights, handles heterogeneous data sets including metadata, supports various import and export formats, includes pipelines for performing all steps of data analysis from normalization and quality control to differential analysis, clustering and functional enrichment, and it is capable of producing high quality figures and reports. The system builds only on open source software and is available on request as sources or as a ready-to-run software image. An instance of the tool is available for testing at omixanalyzer.informatik.hu-berlin.de.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.