Biological processes (like microbial growth & physiological response) are usually dynamic and require the monitoring of metabolic variation at different time-points. Moreover, there is clear shift from case-control (N=2) study to multi-class (N>2) problem in current metabolomics, which is crucial for revealing the mechanisms underlying certain physiological process, disease metastasis, etc. These time-course and multi-class metabolomics have attracted great attention, and data normalization is essential for removing unwanted biological/experimental variations in these studies. However, no tool (including NOREVA 1.0 focusing only on case-control studies) is available for effectively assessing the performance of normalization method on time-course/multi-class metabolomic data. Thus, NOREVA was updated to version 2.0 by (i) realizing normalization and evaluation of both time-course and multi-class metabolomic data, (ii) integrating 144 normalization methods of a recently proposed combination strategy and (iii) identifying the well-performing methods by comprehensively assessing the largest set of normalizations (168 in total, significantly larger than those 24 in NOREVA 1.0). The significance of this update was extensively validated by case studies on benchmark datasets. All in all, NOREVA 2.0 is distinguished for its capability in identifying well-performing normalization method(s) for time-course and multi-class metabolomics, which makes it an indispensable complement to other available tools. NOREVA can be accessed at https://idrblab.org/noreva/.
Abstract-Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of efficient processing of continuous skyline queries over sliding windows on uncertain data elements regarding given probability thresholds. We first characterize what kind of elements we need to keep in our query computation. Then we show the size of dynamically maintained candidate set and the size of skyline. We develop novel, efficient techniques to process a continuous, probabilistic skyline query. Finally, we extend our techniques to the applications where multiple probability thresholds are given or we want to retrieve "top-k" skyline data objects. Our extensive experiments demonstrate that the proposed techniques are very efficient and handle a high-speed data stream in real time.
While RNA sequencing (RNA-seq) has become increasingly popular for transcriptome profiling, the analysis of the massive amount of data generated by large-scale RNA-seq still remains a challenge. RNA-seq data analyses typically consist of (1) accurate mapping of millions of short sequencing reads to a reference genome, including the identification of splicing events; (2) quantifying expression levels of genes, transcripts, and exons; (3) differential analysis of gene expression among different biological conditions; and (4) biological interpretation of differentially expressed genes. Despite the fact that multiple algorithms pertinent to basic analyses have been developed, there are still a variety of unresolved questions. In this chapter, we review the main tools and algorithms currently available for RNA-seq data analyses, and our goal is to help RNA-seq data analysts to make an informed choice of tools in practical RNA-seq data analysis. In the meantime, RNA-seq is evolving rapidly, and newer sequencing technologies are briefly introduced, including stranded RNA-seq, targeted RNA-seq, and single-cell RNA-seq.
No abstract
In this paper, we study the problem of approximate containment similarity search. Given two records Q and X, the containment similarity between Q and X with respect to Q is |Q∩X| |Q| . Given a query record Q and a set of records S, the containment similarity search finds a set of records from S whose containment similarity regarding Q is not less than the given threshold. This problem has many important applications in commercial and scientific fields such as record matching and domain search. Existing solution relies on the asymmetric LSH method by transforming the containment similarity to well-studied Jaccard similarity. In this paper, we use a inherently different framework by transforming the containment similarity to set intersection. We propose a novel augmented KMV sketch technique, namely GB-KMV, which is data-dependent and can achieve a much better trade-off between the sketch size and the accuracy. We provide a set of theoretical analysis to underpin the proposed augmented KMV sketch technique, and show that it outperforms the state-ofthe-art technique LSH-E in terms of estimation accuracy under practical assumption. Our comprehensive experiments on real-life datasets verify that GB-KMV is superior to LSH-E in terms of the space-accuracy trade-off, time-accuracy trade-off, and the sketch construction time. For instance, with similar estimation accuracy (F-1 score), GB-KMV is over 100 times faster than LSH-E on several real-life datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.