Due to the plurality of irrelevant attributes, sparse distribution, and complicated calculations in high-dimensional data, traditional clustering algorithms, such as K-means, do not perform well on high-dimensional data. To address the clustering problem of high-dimensional data, this paper studies an integrated clustering method for high-dimensional data. A method of subspace division based on minimum redundancy is proposed to solve the problem of subspace division of high-dimensional data; subspace division is improved by using the K-means algorithm. Additionally, this method uses mutual information between the characteristic variables of the data to replace the calculation in the K-means algorithm. The distance between the characteristic variables of the data is used to divide the data into subspaces according to the mutual information values between the characteristic variables of the data. To achieve high clustering accuracy and diversity based on clustering requirements, this paper uses a genetic algorithm as the consistency integration function. The fitness function is designed according to the clustering fusion target, and the selection operator is designed according to the maximum number of overlapping elements in the base clustering. The experimental results show that the clustering algorithm proposed in this paper outperforms other methods on most datasets and is an effective clustering integration algorithm. The proposed clustering algorithm is compared with other commonly used clustering fusion algorithms on datasets to prove the advantages of the proposed algorithm.
In recent years, people are more and more interested in time series modeling and its application in prediction. This paper mainly discusses a financial time series image algorithm based on wavelet analysis and data fusion. In this research, we conducted an in-depth study on the scale decomposition sequence and wavelet transform sequence in different scale domains of wavelet transform according to the scale change rule based on wavelet transform. We use wavelet neural network with different input neurons and hidden neurons to predict, respectively. Finally, the prediction results are integrated into the final prediction results based on the original time series by using wavelet reconstruction technology. Using RBF algorithm in neural network and SPSS Clementine, the wavelet transform sequences on five scales are modeled. Each network model has three layers: one input layer, one hidden layer, and one output layer, and each output layer has only one output element. In order to compare the prediction effect of the model proposed in this study, the ordinary RBF network is used to model and predict the log yield itself. When the input sample is 5, the minimum mean square error is obtained when the hidden layer is 6, and the mean square error is 1.6349. The mean square error of the training phase is 0.0209, and the validation error is 1.6141. The results show that the prediction results of the wavelet prediction method combined with the RBF network prediction method are better than those of wavelet prediction or RBF network prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.