2012 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA) 2012
DOI: 10.1109/isspa.2012.6310552
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Document Topic Identification using Wikipedia Hierarchical Ontology

Abstract: The rapid growth in the number of documents available to end users from around the world has led to a greatlyincreased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure in the form of a hierarchical ontology, using one of the largest online knowledge repositories: Wikipedia. Then, a nove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…The details of the algorithm have been omitted due to space limitations. For more details, we refer to our work in Hassan (2013).…”
Section: Basic Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…The details of the algorithm have been omitted due to space limitations. For more details, we refer to our work in Hassan (2013).…”
Section: Basic Methodologymentioning
confidence: 99%
“…The word "identification" in ADTI means finding the best match between the input document set and the input topic list. In contrast to other approaches, the list of topics is a known entity in our approach, which means that there is no need to predict them (Hassan 2013). To better understand the difference, let us consider the following example: assume that we are interested in the topics economics, politics, and sports and we have a document declaring "Barack Obama is the new President of the United States."…”
Section: Basic Methodologymentioning
confidence: 99%
“…In [10], the authors propose a method that uses Wikipedia article titles as well as the category network to identify topics of documents. [11] introduces a method where they first, construct a category-term matrix C from the Wikipedia categories and articles text. Then, they construct a document-term matrix D for the input document and as the final step, calculate the document-category similarity matrix S = DC T , in order to find the relevant topics of a document.…”
Section: Related Workmentioning
confidence: 99%
“…Then, they select categories assigned to these articles and rank them, and finally choose the categories with the highest weights as the topics of the document. [11] proposes a method that constructs a category-term matrix C from Wikipedia exploiting categories and articles text. Then, for the input document a document-term matrix D is constructed.…”
Section: Introductionmentioning
confidence: 99%
“…In the single document summarization, several deep natural language analysis methods are applied. These strategies of document summarization use ontology knowledge based summarization [9,11]. The ontology sources commonly used are WordNet, UMLS.…”
Section: Related Workmentioning
confidence: 99%