1961
DOI: 10.1145/321075.321084
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Indexing: An Experimental Inquiry

Abstract: This inquiry examines a technique for automatically classifying (indexing) documents according to their subject content. The task, in essence, is to have a computing machine read a document and on the basis of the occurrence of selected clue words decide to which of many subject categories the document in question belongs. This paper describes the design, execution and evaluation of a modest experimental study aimed at testing empirically one statistical technique for automatic indexing.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
206
0
9

Year Published

1992
1992
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 445 publications
(215 citation statements)
references
References 1 publication
0
206
0
9
Order By: Relevance
“…The early research work on text categorization focused on how to use a limited number of key words extracted from a given document to index, or classify the document into a predefined set of subject categories. For example, Maron [11] proposed to use statistical technique to automatically index documents. In Maron's experiments, documents were short and clearly written, and the topics of the documents were limited not too heterogeneous.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The early research work on text categorization focused on how to use a limited number of key words extracted from a given document to index, or classify the document into a predefined set of subject categories. For example, Maron [11] proposed to use statistical technique to automatically index documents. In Maron's experiments, documents were short and clearly written, and the topics of the documents were limited not too heterogeneous.…”
Section: Related Workmentioning
confidence: 99%
“…An automatically generated labelled dataset, the Open Directory Project dataset -CategoryDocuments [11], is employed as the experimental dataset to evaluate the above five widely used text categorization algorithms.…”
Section: Experimental Datasetmentioning
confidence: 99%
“…The English senses together with their aligned translations (and probability scores) are then stored in a word sense translation table, in which look-ups are performed during the testing phase. -The Naive Bayes (NB) [11] classifier is a probabilistic classifier that assumes that the features are independent. We compare with this classifier because it has similarities with our new approach, that is, it is also based on the frequencies of the features, but it does not take into account the sparse nature of the data.…”
Section: Experimental Set-upmentioning
confidence: 99%
“…Unlike ROTE, however, Naive Bayes can be competitive with more powerful methods (Domingos & Pazzani, 1996). The idea of a Bayesian method for text classification is almost as old as the idea of automatic text classification itself (Maron, 1961). Lewis discusses the uses of such methods for text classification and retrieval (1992) and provides a large list of pointers to related research (1997).…”
Section: Related Workmentioning
confidence: 99%