Fifth IEEE International Conference on Data Mining (ICDM'05)
DOI: 10.1109/icdm.2005.80
|View full text |Cite
|
Sign up to set email alerts
|

Improving Automatic Query Classification via Semi-Supervised Learning

Abstract: Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose web search systems.Such classification becomes critical if the system is to return results not just from a general web collection but from topic-specific back-end databases as well. Maintaining sufficient classification recall is very difficult as web queries are typically short, yielding few features per query. This feature sparseness coupled with the high query volumes typical for a large-scal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
74
0

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(75 citation statements)
references
References 18 publications
1
74
0
Order By: Relevance
“…Therefore, some prior work in query-classification into topical categories is relevant to vertical selection [13,14,2,1,10]. Because queries are terse, many query-classification approaches augment the query with features beyond the query string, possibly derived from query-logs or corpora of documents associated with the target classes.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, some prior work in query-classification into topical categories is relevant to vertical selection [13,14,2,1,10]. Because queries are terse, many query-classification approaches augment the query with features beyond the query string, possibly derived from query-logs or corpora of documents associated with the target classes.…”
Section: Related Workmentioning
confidence: 99%
“…Because queries are terse, many query-classification approaches augment the query with features beyond the query string, possibly derived from query-logs or corpora of documents associated with the target classes. Bietzel et al use a large (unlabeled) query-log and a technique known as selectional preferencethe query "interest rates" belongs to target category finance because "interest" and "rates" are distributionally similar to the term "finance" [1,2]. Shen et al [13] and other participants of the KDD 2005 Cup [11] use corpus-based evidence.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the queries were classified into the subject taxonomy using the classifiers through a consensus function. The main limitations of the proposed method are the dependency of the classification to the quality of the training data, the human effort involved in the training data construction and the semi-automatic nature of the approach which limits the scale of the method applications [8] and [9] and [15].…”
Section: Related Topics and Applications Of Query Classification Methodsmentioning
confidence: 99%
“…user-click behavior or part-ofspeech information, to perform the categorization task. (Beitzel et al, 2005) proposed a method for automatic query classification by leveraging unlabeled data within a semi-supervised learning framework. Their semi-supervised approach facilitated the augmentation of labeled training samples for the classification task.…”
Section: Search Intent Detection and Categorizationmentioning
confidence: 99%