On Twitter, people often use hashtags to mark the subject of a tweet. Tweets have specific themes or content that are easy for people to manage. With the increase in the number of tweets, how to automatically recommend hashtags for tweets has received wide attention. The previous hashtag recommendation methods were to convert the task into a multi-class classification problem. However, these methods can only recommend hashtags that appeared in historical information, and cannot recommend the new ones. In this work, we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task. To train and evaluate the proposed method, we used the real tweet data which is collected from Twitter. Experimental results show that the proposed method can be significantly better than the most advanced method. Compared with the state-of-the-art methods, the accuracy of our method has been increased 4%.
Clustering technology has been applied in numerous applications. It can enhance the performance of information retrieval systems, it can also group Internet users to help improve the click-through rate of on-line advertising, etc. Over the past few decades, a great many data clustering algorithms have been developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two new data clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics. Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP, do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all of AP, DP and DBSCAN should be considered. Moreover, we find that the comparison of different clustering algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This work has important reference values for researchers and engineers who need to select appropriate clustering algorithms for their specific applications.
Relation extraction is an important task in NLP community. However, some models often fail in capturing Long-distance dependence on semantics, and the interaction between semantics of two entities is ignored. In this paper, we propose a novel neural network model for semantic relation classification called joint self-attention bi-LSTM (SA-Bi-LSTM) to model the internal structure of the sentence to obtain the importance of each word of the sentence without relying on additional information, and capture Long-distance dependence on semantics. We conduct experiments using the SemEval-2010 Task 8 dataset. Extensive experiments and the results demonstrated that the proposed method is effective against relation classification, which can obtain state-ofthe-art classification accuracy just with minimal feature engineering.
Recent studies have shown that compared with traditional social networks, networks in which users socialize through interest recommendation have obvious homogeneity characteristics. Recommending topics of interest to users has become one of the main objectives of recommendation systems in such social networks, and the widespread data sparsity in such social networks has become the main problem faced by such recommendation systems. Particularly, in the oracle interest network, this problem is more difficult to solve because there are very few people who read and understand the Oracle. To address this problem, we propose an ant colony algorithm based recognition algorithm that can greatly expand the data in the oracle interest network and thus improve the efficiency of oracle interest network recommendation in this paper. Using the one-to-one correspondence between characters and translation in Oracle rubbings, the Oracle recognition problem is transformed into character matching problem, which can skip manual feature engineering experts, so as to realize efficient Oracle recognition. First, the coordinates of each character in the oracle bones are extracted. Then, the matching degree value of each oracle character corresponding to the translation of the oracle rubbings is assigned according to the coordinates. Finally, the maximum matching degree value of each character is searched using the improved ant colony algorithm, and the search result is the Chinese character corresponding to the oracle rubbings. In this paper, through experimental simulation, it is proved that this method is very effective when applied to the field of oracle recognition, and the recognition rate can approach 100% in some special oracle rubbings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.