Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012
DOI: 10.1145/2348283.2348511
|View full text |Cite
|
Sign up to set email alerts
|

Short text classification using very few words

Abstract: We propose a simple, scalable, and non-parametric approach for short text classification. Leveraging the well studied and scalable Information Retrieval (IR) framework, our approach mimics human labeling process for a piece of short text. It first selects the most representative and topical-indicative words from a given short text as query words, and then searches for a small set of labeled short texts best matching the query words. The predicted category label is the majority vote of the search results. Evalu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 83 publications
(37 citation statements)
references
References 5 publications
0
37
0
Order By: Relevance
“…To evaluate the performance of the above approach, we use the Web snippet dataset used in (Phan et al, 2008;Chen et al, 2011;Sun, 2012 Table 2: The number of unseen words to query the web search engine (Google) and selected the top 20 or 30 snippets from the search results. Different phrases for the training and test data were used to make sure that test data were difficult to classify (Phan et al, 2008).…”
Section: Dataset and Evaluation Metricsmentioning
confidence: 99%
“…To evaluate the performance of the above approach, we use the Web snippet dataset used in (Phan et al, 2008;Chen et al, 2011;Sun, 2012 Table 2: The number of unseen words to query the web search engine (Google) and selected the top 20 or 30 snippets from the search results. Different phrases for the training and test data were used to make sure that test data were difficult to classify (Phan et al, 2008).…”
Section: Dataset and Evaluation Metricsmentioning
confidence: 99%
“…Based on traditional texts such as KNN [3], Bayesian classification [4], decision tree [5], SVM [6], and maximum entropy [7], previous classification methods have achieved good results in text classification. However, these methods require enough co-occurrence information of word frequency in texts, and it is not effective when applying to short text classification.…”
Section: Related Workmentioning
confidence: 99%
“…Likely, Chen et al, moved forward along this direction and proposed a method to leverage topics at multiple granularities [28]. In other ways, a search and vote strategy with search results was used in labeling the query candidates in Sun's paper [29]. Meng et al, presents an effective algorithm for semantic similarity metric of word pairs [30] and a new similar queries metric for query suggestions, then measure similar queries from the level of semantic level [31].…”
Section: Short Text Processingmentioning
confidence: 99%