2000
DOI: 10.1023/a:1007649029923
|View full text |Cite
|
Sign up to set email alerts
|

Untitled

Abstract: Abstract. This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
67
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,725 publications
(67 citation statements)
references
References 27 publications
0
67
0
Order By: Relevance
“…The text categorization algorithm which is based on boosting was proposed by RE Schapire and Y Singer [12], which laid a foundation for later researchers to study text classification using boosting algorithm, but the selection of its weak hypothesis is determined only by the presences or absences of features, and it mainly aims at English text classification. Junli Chen et al proposed multi-label classification algorithm based on boosting, which can effectively solve the problem of Chinese text classification [14], but the algorithm does not select the best weak hypothesis in the iteration process of each round.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The text categorization algorithm which is based on boosting was proposed by RE Schapire and Y Singer [12], which laid a foundation for later researchers to study text classification using boosting algorithm, but the selection of its weak hypothesis is determined only by the presences or absences of features, and it mainly aims at English text classification. Junli Chen et al proposed multi-label classification algorithm based on boosting, which can effectively solve the problem of Chinese text classification [14], but the algorithm does not select the best weak hypothesis in the iteration process of each round.…”
Section: Related Workmentioning
confidence: 99%
“…When researchers use AdaBoost algorithm to do classification research, they generally use the decision tree as the weak hypothesis. Each feature is regarded as a decision tree, and the judgment condition is only whether a document contains feature w [12], and the AdaBoost algorithm returns "+1" or "-1" to tacklenary categorization problems. In this case, it is possible to divide the negative samples including the feature w into the positive class, and divide the positive samples excluding the feature w into the negative class.…”
Section: Introductionmentioning
confidence: 99%
“…One straightforward choice is to use zero as the calibration constant [7,8]. An alternative choice for the calibration constant is 0.5, when the multi-label learned model f px, yq represents the posterior probability of y being a proper label of x [10,11,23].…”
Section: Elm-mlmentioning
confidence: 99%
“…It is convenient and fast to implement a problem transformation method due to the number of existing techniques and their free software. Representative algorithms include Binary Relevance [7], AdaBoost.MH [8], Calibrated Label Ranking [3], Random k-labelsets [9], etc.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Srivastava et al, point out that deep neural networks (DNN) with a large number of parameters are powerful machine learning models but seriously suffer from overfitting [3]. To solve the overfitting problem and pursue better classification performance, several ensemble approaches have been proposed, such as BoosTexter and Bonzaiboost [4,5]. However, the accuracies of these methods are far from being satisfactory and large networks, like DNN, are slow to use, making it difficult to deal with overfitting by combining many different large neural nets [3].…”
Section: Introductionmentioning
confidence: 99%