2002
DOI: 10.1007/3-540-46119-1_1
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Background Knowledge into Nearest-Neighbor Text Classification

Abstract: This paper describes two different approaches for incorporating background knowledge into nearest-neighbor text classification. Our first approach uses backgroundtext to assessthe similarity between training and test documents rather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similarities in this redescribed space. Our experimental results show that both approaches can improve the performance of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2004
2004
2013
2013

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(11 citation statements)
references
References 3 publications
0
11
0
Order By: Relevance
“…More recently, in the field of machine learning, the combined use of labeled and unlabeled examples has been found effective for different tasks (Seeger 2000). Specifically, there are several semi-supervised methods for text categorization, which in turn are based on different learning algorithms, such as Naïve Bayes (Nigam et al 2000;Peng et al 2004), Support Vector Machines (Joachims 1999), and nearest-neighbor algorithms (Zelikovitz and Hirsh 2002). Our method differs from all these previous approaches in two main concerns: -It does not require a predefined set of unlabeled data; instead, it considers their automatic extraction from the Web.…”
Section: Semi-supervised Learningmentioning
confidence: 98%
“…More recently, in the field of machine learning, the combined use of labeled and unlabeled examples has been found effective for different tasks (Seeger 2000). Specifically, there are several semi-supervised methods for text categorization, which in turn are based on different learning algorithms, such as Naïve Bayes (Nigam et al 2000;Peng et al 2004), Support Vector Machines (Joachims 1999), and nearest-neighbor algorithms (Zelikovitz and Hirsh 2002). Our method differs from all these previous approaches in two main concerns: -It does not require a predefined set of unlabeled data; instead, it considers their automatic extraction from the Web.…”
Section: Semi-supervised Learningmentioning
confidence: 98%
“…These go beyond the bag-of-words model providing a more advanced view of the document space that may require less knowledge engineering activity within a TCBR theme [17][18][19]. Unfortunately it can be seen that these have crucial drawbacks also.…”
Section: Introductionmentioning
confidence: 94%
“…They demonstrated that the incorporation of two knowledge sources adopted improve the classifier's performance. In (Zelikovitz and Hirsh 2003), various external knowledge sources to the primary source to be classified were utilized. In the classification of journal titles, abstracts and reviews were used by classifiers.…”
Section: Document Classificationmentioning
confidence: 99%