Traditional stock market prediction approaches commonly utilize the historical price-related data of the stocks to forecast their future trends. As the Web information grows, recently some works try to explore financial news to improve the prediction. Effective indicators, e.g., the events related to the stocks and the people's sentiments towards the market and stocks, have been proved to play important roles in the stocks' volatility, and are extracted to feed into the prediction models for improving the prediction accuracy. However, a major limitation of previous methods is that the indicators are obtained from only a single source whose reliability might be low, or from several data sources but their interactions and correlations among the multi-sourced data are largely ignored.In this work, we extract the events from Web news and the users' sentiments from social media, and investigate their joint impacts on the stock price movements via a coupled matrix and tensor factorization framework. Specifically, a tensor is firstly constructed to fuse heterogeneous data and capture the intrinsic * Corresponding author relations among the events and the investors' sentiments. Due to the sparsity of the tensor, two auxiliary matrices, the stock quantitative feature matrix and the stock correlation matrix, are constructed and incorporated to assist the tensor decomposition. The intuition behind is that stocks that are highly correlated with each other tend to be affected by the same event. Thus, instead of conducting each stock prediction task separately and independently, we predict multiple correlated stocks simultaneously through their commonalities, which are enabled via sharing the collaboratively factorized low rank matrices between matrices and the tensor. Evaluations on the China A-share stock data and the HK stock data in the year 2015 demonstrate the effectiveness of the proposed model.
Detecting near duplicates on the web is challenging due to its volume and variety. Most of the previous studies require the setting of input parameters, making it difficult for them to achieve robustness across various scenarios without careful tuning. Recently, a universal and parameter-free similarity metric, the normalized compression distance or NCD, has been employed effectively in diverse applications. Nevertheless, there are problems preventing NCD from being applied to medium-to-large datasets as it lacks efficiency and tends to get skewed by large object size. To make this parameter-free method feasible on a large corpus of web documents, we propose a new method called SigNCD which measures NCD based on lightweight signatures instead of full documents, leading to improved efficiency and stability. We derive various lower bounds of NCD and propose pruning policies to further reduce computational complexity. We evaluate SigNCD on both English and Chinese datasets and show an increase in 1 score compared with the original NCD method and a significant reduction in runtime. Comparisons with other competitive methods also demonstrate the superiority of our method. Moreover, no parameter tuning is required in SigNCD, except a similarity threshold.
BackgroundRheumatism covers a wide range of diseases with complex clinical manifestations and places a tremendous burden on humans. For many years, our understanding of rheumatism was seriously hindered by technology constraints. However, the increasing application and rapid advancement of sequencing technology in the past decades have enabled us to study rheumatism with greater accuracy and in more depth. Sequencing technology has made huge contributions to the field and is now an indispensable component and powerful tool in the study of rheumatism.MethodsArticles on sequencing and rheumatism, published from 1 January 2000 to 25 April 2022, were retrieved from the Web of Science™ (Clarivate™, Philadelphia, PA, USA) database. Bibliometrix, the open-source tool, was used for the analysis of publication years, countries, authors, sources, citations, keywords, and co-words.ResultsThe 1,374 articles retrieved came from 62 countries and 350 institutions, with a general increase in article numbers during the last 22 years. The leading countries in terms of publication numbers and active cooperation with other countries were the USA and China. The most prolific authors and most popular documents were identified to establish the historiography of the field. Popular and emerging research topics were assessed by keywords and co-occurrence analysis. Immunological and pathological process in rheumatism, classification, risks and susceptibility, and biomarkers for diagnosis were among the hottest themes for research.ConclusionsSequencing technology has been widely applied in the study of rheumatism and propells research in the area of discovering novel biomarkers, related gene patterns and physiopathology. We suggest that further efforts be made to advance the study of genetic patterns related to rheumatic susceptibility, pathogenesis, classification and disease activity, and novel biomarkers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.