Proceedings of the 2008 ACM Symposium on Applied Computing 2008
DOI: 10.1145/1363686.1363899
|View full text |Cite
|
Sign up to set email alerts
|

Discovering relationships among categories using misclassification information

Abstract: Knowledge of relationships among categories is of the interest in different domains such as text classification, content analysis, and text mining. We propose and evaluate approaches to effectively identify relationships among document categories. Our proposed novel method capitalizes on the misclassification results of a text classifier to identify potential relationships among categories. We demonstrate that our system detects such relationships, even those relationships that assessors failed to identify in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
5
0

Year Published

2009
2009
2014
2014

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 8 publications
1
5
0
Order By: Relevance
“…One should note that much more relationships were identified between categories in the ODP46 dataset than in the 20NG dataset. This finding was reported in [10] and was also validated by human evaluators. As the precision for predicting the relationships in the 20NG dataset (59.0%) is higher than in the ODP46 dataset (31.8%), the probability of wrongly lowering a true positive is higher in the OPD46 dataset.…”
Section: Effects Of Misclassification Informationsupporting
confidence: 80%
See 2 more Smart Citations
“…One should note that much more relationships were identified between categories in the ODP46 dataset than in the 20NG dataset. This finding was reported in [10] and was also validated by human evaluators. As the precision for predicting the relationships in the 20NG dataset (59.0%) is higher than in the ODP46 dataset (31.8%), the probability of wrongly lowering a true positive is higher in the OPD46 dataset.…”
Section: Effects Of Misclassification Informationsupporting
confidence: 80%
“…While performing text classification on the documents, we keep track of the number of keywords [10] in each document. Keywords are terms that are found to consistently occur in documents of a category, and are not frequently found in documents of other categories.…”
Section: Least Ambiguous Segmentsmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to the large amount of online news documents, effective management of these resources has become a challenging goal for researchers in the field of information retrieval, text categorization and text mining. Towards this, the most common functions include clustering and classifying news documents [4,7,26,35,39], extracting semantic relationships between entities from news documents [37,40,41], detecting event from news stories [3,18,28,34], and discovering meaningful relations among news documents [20,30,38]. Among these tasks, discovering news relations has recently been focused among several studies.…”
Section: Introductionmentioning
confidence: 99%
“…Towards automated content organization, while classification techniques can be applied to assign a category label to each document based on a number of criteria, such as text genre, text style, and users' interest [1]- [3]. Some of them can be adopted for classifying news documents [4], [5]. By the classification method, it requires users to provide a number of predefined classes and a large number of training examples.…”
Section: Introductionmentioning
confidence: 99%