The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account.This paper proposes a new indexing framework for locationaware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.
BackgroundOne major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data.ResultsIn this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types.ConclusionsLRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2223-8) contains supplementary material, which is available to authorized users.
Web users and content are increasingly being geopositioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving topk spatial keyword (MkSK) queries over spatial keyword data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within a zone. However, existing safe zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the computation on the server as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. Empirical studies with real data suggest that our proposals are efficient.
Epigenetic modification of miR-429 can manipulate liver T-ICs by targeting the RBBP4/E2F1/OCT4 axis. This miRNA might be targeted to inactivate T-ICs, thus providing a novel strategy for HCC prevention and treatment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.