2007
DOI: 10.1007/s10791-007-9028-6
|View full text |Cite
|
Sign up to set email alerts
|

An integrated system for building enterprise taxonomies

Abstract: Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by some keywords, the web search engine is then used to construct a small set of labeled documents, and a topic tracking algorithm with keyword-base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2008
2008
2011
2011

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 45 publications
0
2
0
Order By: Relevance
“…However, human beings are sometimes involved to aid the construction of taxonomies [Zhang et al 2004;Gates et al 2005], making it rather complicated to evaluate. Here, we concentrate on those methods constructing taxonomies automatically.…”
Section: Taxonomy Generation Via Clusteringmentioning
confidence: 99%
“…However, human beings are sometimes involved to aid the construction of taxonomies [Zhang et al 2004;Gates et al 2005], making it rather complicated to evaluate. Here, we concentrate on those methods constructing taxonomies automatically.…”
Section: Taxonomy Generation Via Clusteringmentioning
confidence: 99%
“…Most current approaches for creating taxonomies and the corresponding automated classifiers try to minimize the number of labeled training documents needed by various technologies that either use some sort of bootstrapping or clustering based on a set of documents [1,5,7] or seeding the process based on some examples that are (semi-)automatically extended [12,19]. Our approach focuses on using document sets retrieved based on search engine queries defined per category.…”
Section: Related Workmentioning
confidence: 99%