Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.
Abstract. Semantic relatedness and disambiguation are fundamental problems for linking text documents to the Web of Data. There are many approaches dealing with both problems but most of them rely on word or concept distribution over Wikipedia. They are therefore not applicable to concepts that do not have a rich textual description. In this paper, we show that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base. In addition, we propose a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness. As opposed to the majority of state-of-the-art systems that target mainly named entities, we use our approach to disambiguate both entities and common nouns. In our experiments, we first validate our relatedness measure on multiple knowledge bases and ground truth datasets and show that it performs better than related state-of-the-art graph based measures. Afterwards, we evaluate the disambiguation algorithm and show that it also achieves superior disambiguation accuracy with respect to alternative state-ofthe-art graph-based algorithms.
Social sites and services rely on the continuing activity, good will and behaviour of the contributors to remain viable. There has been little empirical study of the mechanisms by which social sites maintain a viable user base. Such studies would provide a scientific understanding of the patterns that lead to user churn (i.e. users leaving the community) and the community dynamics that are associated with reduction of community members -primary threats to the sustainability of any service. In this paper, we explore the relation between a user's value within a community -constituted from various user features -and the probability of a user churning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.