Word embedding is an important reference for natural language processing tasks, which can generate distribution presentations of words based on many text data. Recent evidence demonstrates that introducing sememe knowledge is a promising strategy to improve the performance of word embedding. However, previous works ignored the structure information of sememe knowledges. To fill the gap, this study implicitly synthesized the structural feature of sememes into word embedding models based on an attention mechanism. Specifically, we propose a novel double attention word-based embedding (DAWE) model that encodes the characteristics of sememes into words by a “double attention” strategy. DAWE is integrated with two specific word training models through context-aware semantic matching techniques. The experimental results show that, in word similarity task and word analogy reasoning task, the performance of word embedding can be effectively improved by synthesizing the structural information of sememe knowledge. The case study also verifies the power of DAWE model in word sense disambiguation task. Furthermore, the DAWE model is a general framework for encoding sememes into words, which can be integrated into other existing word embedding models to provide more options for various natural language processing downstream tasks.
Human activities embedded in crowdsourced data, such as social media trajectory, represent individual daily styles and patterns, which are valuable in many applications. However, the accurate identification of human activity types (HATs) from social media is challenging, possibly because interactions between posts and users at different time are overlooked. To fill this gap, we propose a novel model that introduces the interactions hidden in social media and synthesizes Graph Convolutional Network (GCN) for identifying HAT. The model first characterizes interactions among words, posts, dates, and users, and then derives a Time Gated Human Activity Graph Convolutional Network (TG-HAGCN) to predict the HATs of social media trajectory. To examine the proposed model performance, we built a new dataset including interactions between post content, post time, and users from the open Yelp dataset. Experimental results show that exploiting interactions hidden in social media to recognize HATs achieves state-of-the-art performance with high accuracy. The study indicates that interactions among social media promotes ability of machine learning on social media data mining and intelligent applications, and offers a reference solution for how to fuse multi-type heterogeneous data in social media.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.