Enhancing text clustering by leveraging Wikipedia semantics

Hu, Jian; Fang, Lujun; Cao, Yang; Zeng, Hua-Jun; Li, Hua; Yang, Qiang; Zheng, Chen

doi:10.1145/1390334.1390367

Cited by 154 publications

(97 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We performed lemmatisation, part of speech annotation, named entity tagging, and dependency parsing using Stanford CoreNLP toolkits . We used the Jan. 30, 2010 English version of Wikipedia and processed it according to the method described by Hu et al (2008).…”

Section: Experimental Settingsmentioning

confidence: 99%

Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension

Lin¹,

Sun²,

Han³

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Reasoning with commonsense knowledge is critical for natural language understanding.Traditional methods for commonsense machine comprehension mostly only focus on one specific kind of knowledge, neglecting the fact that commonsense reasoning requires simultaneously considering different kinds of commonsense knowledge. In this paper, we propose a multi-knowledge reasoning method, which can exploit heterogeneous knowledge for commonsense machine comprehension. Specifically, we first mine different kinds of knowledge (including event narrative knowledge, entity semantic knowledge and sentiment coherent knowledge) and encode them as inference rules with costs. Then we propose a multiknowledge reasoning model, which selects inference rules for a specific reasoning context using attention mechanism, and reasons by summarizing all valid inference rules. Experiments on RocStories show that our method outperforms traditional models significantly.

show abstract

Section: Experimental Settingsmentioning

confidence: 99%

Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension

Lin¹,

Sun²,

Han³

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Examples include information retrieval [4,14,18], named entity disambiguation [1,2,7,8,11,12], text classification [25] and entity ranking [10]. To extract the content of an entity context, many researches directly used the Wikipedia article describing the entity [1,2,8,9,14,[25][26][27]; some works extended the article with all the other Wikipedia articles linked to the Wikipedia article describing the entity [6,7,12]; while some only considered the first paragraph of the Wikipedia article describing the entity [2]. Different from these approaches, our Graph-based approach not only employs in-links and languagelinks to broaden the article set that is likely to mention the entity, but also performs a finer-grained process: extracting the sentences that mention the entity, such that all the sentences in our context are closely related to the target entity.…”

Section: Related Workmentioning

confidence: 99%

“…As to the context-based representation vector of the entity, [1,11] defined it as the tf-idf/word count/binary occurrence values of all the vocabulary words in the context content; [2,19] defined it as the word count/binary occurrence values of other entities in the context content; [5,6,9,14,25] defined it as the tf-idf similarity values between the target entity's context content and other entities' context contents from Wikipedia; [27] defined it as the visiting probability from the target entity to other entities from Wikipedia; [7,26] used a measurement based on the common entities linked to the target entity and other entities from Wikipedia. Different from all former researches, we employ aspect weights that have a different interpretation of the frequency and selectivity than the typical tf-idf values and take co-occurrence and language specificity of the aspects into account.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

Zhou

Demidova

Cristea

2017

Transactions on Computational Collective Intelligence XXVI

View full text Add to dashboard Cite

“…Various approaches have been proposed [3,15,5]. We take the same route as [9], and use Wikipedia's vocabulary of anchor texts to connect words and phrases to Wikipedia articles.…”

Section: Selecting Relevant Wikipedia Conceptsmentioning

confidence: 99%

Clustering Documents Using a Wikipedia-Based Concept Representation

Huang

Milne

Frank

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.

show abstract

Enhancing text clustering by leveraging Wikipedia semantics

Cited by 154 publications

References 13 publications

Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension

Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

Clustering Documents Using a Wikipedia-Based Concept Representation

Contact Info

Product

Resources

About