Semantic similarity is a fundamental concept in computational linguistics. The models used for the representation of text have a major role in similarity computation. The text with multilingual and multimodal components shows the need for computing similarity based on different characteristics of text. This chapter studies various aspects of semantic similarity of linguistic units, cross-level similarity, semantic models, and similarity measures. One of the main motivations of this chapter is to analyze semantic similarity models such as geometric models, feature-based models, graph-based models, vector space models, and formal concept analysis models. In addition, a composite summary score based on words and hashtags is applied for the tweet summarization task which is effective when compared with other measures.
Social media texts like tweets and blogs are collaboratively created by human interaction. Fast change in trends leads to topic drift in the social media text. This drift is usually associated with words and hashtags. However, geotags play an important part in determining topic distribution with location context. Rate of change in the distribution of words, hashtags and geotags cannot be considered as uniform and must be handled accordingly. This paper builds a topic model that associates topic with a mixture of distributions of words, hashtags and geotags. Stochastic gradient Langevin dynamic model with varying mini-batch sizes is used to capture the changes due to the asynchronous distribution of words and tags. Topical word embedding with co-occurrence and location contexts are specified as hashtag context vector and geotag context vector respectively. These two vectors are jointly learned to yield topical word embedding vectors related to tags context. Topical word embeddings over time conditioned on hashtags and geotags predict, location-based topical variations effectively. When evaluated with Chennai and UK geolocated Twitter data, the proposed joint topical word embedding model enhanced by the social tags context, outperforms other methods.
Social media texts like tweets and blogs are collaboratively created by human interaction. Fast change in trends leads to topic drift in the social media text. This drift is usually associated with words and hashtags. However, geotags play an important part in determining topic distribution with location context. Rate of change in the distribution of words, hashtags and geotags cannot be considered as uniform and must be handled accordingly. This paper builds a topic model that associates topic with a mixture of distributions of words, hashtags and geotags. Stochastic gradient Langevin dynamic model with varying mini-batch sizes is used to capture the changes due to the asynchronous distribution of words and tags. Topical word embedding with co-occurrence and location contexts are specified as hashtag context vector and geotag context vector respectively. These two vectors are jointly learned to yield topical word embedding vectors related to tags context. Topical word embeddings over time conditioned on hashtags and geotags predict, location-based topical variations effectively. When evaluated with Chennai and UK geolocated Twitter data, the proposed joint topical word embedding model enhanced by the social tags context, outperforms other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.