In order to better demonstrate the evolution relationships between the events from newswires and to improve the readability of the event evolution graphs, we propose an improved news event evolution model from a view of users' reading willingness. The model discusses two factors that affect the willingness of users' reading, including the comprehensiveness of news information and reading cost. We define the cost function of user's reading to determine the granularity of news events. After classifying the news stories by K-means clustering algorithm, this model takes the general structure of the news reports into consideration to calculate the TF-IDF weights and does some correction as well as model fusion. Finally, the parameters of the model are estimated by genetic algorithm based on Levy flight. By generating a more readable event evolution graph, our model is more capable of discovering the evolution relationships between the News events. We carried out experiments to evaluate the performance of our proposed model. The result shows that the proposed model outperformed the baseline and other comparable models in previous work by about 13% in the corpus we collected from the CNN & ABC News websites.
The challenge of solving data mining problems in e-commerce applications such as recommendation system (RS) and click-through rate (CTR) prediction is how to make inferences by constructing combinatorial features from a large number of categorical features while preserving the interpretability of the method. In this paper, we propose Automatic Embedded Feature Engineering(AEFE), an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection. By selecting the potential field pairs intelligently and generating a series of interpretable combinatorial features, our framework can provide a set of unseen generated features for enhancing model performance and then assist data analysts in discovering the feature importance for particular data mining tasks. Furthermore, AEFE is distributed implemented by task-parallelism, data sampling, and searching schema based on Matrix Factorization field combination, to optimize the performance and enhance the efficiency and scalability of the framework. Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.