The traditional K-means algorithm has been widely used in cluster analysis. However, the algorithm only involves the distance factor as the only constraint, so there is a problem of sensitivity to special data points. To address this problem, in the process of K-means clustering, ambiguity is introduced as a new constraint condition. Hence, a new membership Equation is proposed on this basis, and a method for solving the initial cluster center points is given, so as to reduce risks caused by random selection of initial points. Besides, an optimized clustering algorithm with Gaussian distribution is derived with the utilization of fuzzy entropy as the cost function constraint. Compared with the traditional clustering method, the new EquationâȂŹs membership degree can reflect the relationship between a certain point and the set in a clearer way, and solve the problem of the traditional K-means algorithm that it is prone to be trapped in local convergence and easily influenced by noise. Experimental verification proves that the new method has fewer iterations and the clustering accuracy is better than other methods, thus having a better clustering effect. INDEX TERMS K-means; fuzzy entropy; cluster center; membership degree; Fuzzy clustering I. INTRODUCTION 1 The clustering process is the most effective classification 2 method for people to summarize complex external informa-3 tion [1]. Though classification can see a mature development 4 now, there are still challenges for the clustering algorithm 5 regarding how to eventually realize cognition, learning and 6 classification under unsupervised conditions by extracting 7 data features [2]. No model can be used universally and 8 achieve better results, since it is not a priori [3]. Data imply 9 enormous scientific and commercial values [4], especially 10 in the explosive growth of data production in recent years. 11 In 2016, the global data volume reached 10ZB and main-12 tained an annual growth rate of more than 40Scattered raw 13 data, processed with data mining technology, can deliver 14 valuable results, such as the planning of humanities and 15 the construction of biological sciences in the reference [6]-16 [8]. This type of research is of great significance for both 17 social development and human self-cognition and learning 18 cognition. It can be clearly seen that clustering research on 19 various types of data has attracted academic attention for a 20 long time [9].
Click-through Rate (CTR) prediction has become one of the core tasks of the recommendation system and its online advertising with the development of e-commerce. In the CTR prediction field, different features extraction schemes are used to mine the user click behavior to achieve the maximum CTR, which helps the advertisers maximize their profits. At present, achievements have been made in CTR prediction based on Deep Neural Network (DNN), but insufficiently, DNN can only learn high-order features combination. In this paper, Product & Cross supported Stacking Network with LightGBM (PCSNL) is proposed for CTR prediction to solve such problems. Firstly, the L 1 and L 2 regularizations are imposed on Light Gradient Boosting Machine (LightGBM) to prevent overfitting. Secondly, the method of vector-wise feature interactions is added to product layer in product network to learn second-order feature combinations. Lastly, feature information is fully learned through the cross network, product network and stacking network in PCSNL. The online ads CTR prediction datasets released by Huawei and Avazu on the Kaggle platform are involved for experiments. It is shown that the PCSN model and PCSNL have better performance than the traditional CTR prediction models and deep learning models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.