Microarray contains a large matrix of information and has been widely used by biologists and bio data scientist for monitoring combinations of genes in different organisms. The coherent patterns in all continuous columns are mined in gene microarray data matrices. It is investigated, in this study, the coherent patterns in all continuous columns in gene microarray data matrix by developing the time series similarity measure for the coherent patterns in all continuous columns, as well as the evaluation function for verifying the proposed algorithm and the corresponding biclusters. The continuous time changes are taken into account in the coherent patterns in all continuous columns, and co-expression patterns in time series are searched. In order to use all the common information between sequences, a similarity measure for the coherent patterns in continuous columns is defined in this paper. To validate the efficiency of the similarity measure to mine biological information at continuous time points, an evaluation function is defined to measure biclusters and an effective algorithm is proposed to mine the biclusters. Simulation experiments are conducted to verify the biological significance of the biclusters, which include synthetic datasets and real gene microarray datasets. The performance of the algorithm is analyzed and the results show that the algorithm is highly efficient.
The prediction of stock market's trend has become a challenging task for a long time, which is affected by a variety of deterministic and stochastic factors. In this paper, a biclustering algorithm is introduced to find the local patterns in the quantized historical data. The local patterns obtained are regarded as the trading rules. Then the trading rules are applied in the short term prediction of the stock price, combined with the minimum-error-rate classification of the Bayes decision theory under the assumption of multivariate normal probability model. In addition, this paper also makes use of the idea of the stream mining to weaken the impact of historical data on the model and update the trading rules dynamically. The experiment is implemented on real datasets and the results prove the effectiveness of the proposed algorithm.
Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.
Most traditional biclustering algorithms focus on biclustering model on non-continuous column, which are not suitable for the analysis of the time series gene expression data. We proposes an effective and exact algorithm, which can be used to mine biclusters with coherent evolution on the contiguous columns as well as the complementary biclusters and time-lagged biclusters for the analysis of time series gene expression data. The experimental results show that the algorithm can find biclusters with statistical significance and strong biological relevance. We extend it to the currency data analysis in the financial field and obtain meaningful results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.