Untitled

Schapire, Robert E.; Singer, Yoram

doi:10.1023/a:1007649029923

Cited by 1,725 publications

(67 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The text categorization algorithm which is based on boosting was proposed by RE Schapire and Y Singer [12], which laid a foundation for later researchers to study text classification using boosting algorithm, but the selection of its weak hypothesis is determined only by the presences or absences of features, and it mainly aims at English text classification. Junli Chen et al proposed multi-label classification algorithm based on boosting, which can effectively solve the problem of Chinese text classification [14], but the algorithm does not select the best weak hypothesis in the iteration process of each round.…”

Section: Related Workmentioning

confidence: 99%

“…When researchers use AdaBoost algorithm to do classification research, they generally use the decision tree as the weak hypothesis. Each feature is regarded as a decision tree, and the judgment condition is only whether a document contains feature w [12], and the AdaBoost algorithm returns "+1" or "-1" to tacklenary categorization problems. In this case, it is possible to divide the negative samples including the feature w into the positive class, and divide the positive samples excluding the feature w into the negative class.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Chinese Short Text Categorization Based on Semi-Supervised Learning

Ma¹,

Xiong²,

Zhang³

et al. 2017

dtcse

View full text Add to dashboard Cite

Abstract. Most of the text on the Internet is unlabelled with the rapid development of the Internet, and it is difficult for us to classify the unlabelled text accurately under the condition of insufficient labelled samples. Sei-supervised learning is a method, which combines the labelled samples with the unlabelled samples, can solve the problem in a better way. AdaBoost is one of the most representative algorithm of boosting algorithms, and this paper used the improved decision tree to be weak classifiers of the AdaBoost. Based on this, this paper devised a boosting algorithm which was based on semi-supervised learning and the improved decision tree. The algorithm is devoted to solving the problem of the Chinese short text categorization under the condition of insufficient labelled samples. Experiments show that the algorithm can effectively improve the performance of the Chinese short text categorization on balanced and imbalanced data sets.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Chinese Short Text Categorization Based on Semi-Supervised Learning

Ma¹,

Xiong²,

Zhang³

et al. 2017

dtcse

View full text Add to dashboard Cite

show abstract

“…One straightforward choice is to use zero as the calibration constant [7,8]. An alternative choice for the calibration constant is 0.5, when the multi-label learned model f px, yq represents the posterior probability of y being a proper label of x [10,11,23].…”

Section: Elm-mlmentioning

confidence: 99%

“…It is convenient and fast to implement a problem transformation method due to the number of existing techniques and their free software. Representative algorithms include Binary Relevance [7], AdaBoost.MH [8], Calibrated Label Ranking [3], Random k-labelsets [9], etc.…”

Section: Introductionmentioning

confidence: 99%

Extreme Learning Machine for Multi-Label Classification

Sun

Jiang

et al. 2016

Entropy

View full text Add to dashboard Cite

Extreme learning machine (ELM) techniques have received considerable attention in the computational intelligence and machine learning communities because of the significantly low computational time required for training new classifiers. ELM provides solutions for regression, clustering, binary classification, multiclass classifications and so on, but not for multi-label learning. Multi-label learning deals with objects having multiple labels simultaneously, which widely exist in real-world applications. Therefore, a thresholding method-based ELM is proposed in this paper to adapt ELM to multi-label classification, called extreme learning machine for multi-label classification (ELM-ML). ELM-ML outperforms other multi-label classification methods in several standard data sets in most cases, especially for applications which only have a small labeled data set.

show abstract

“…For example, Srivastava et al, point out that deep neural networks (DNN) with a large number of parameters are powerful machine learning models but seriously suffer from overfitting [3]. To solve the overfitting problem and pursue better classification performance, several ensemble approaches have been proposed, such as BoosTexter and Bonzaiboost [4,5]. However, the accuracies of these methods are far from being satisfactory and large networks, like DNN, are slow to use, making it difficult to deal with overfitting by combining many different large neural nets [3].…”

Section: Introductionmentioning

confidence: 99%

Overfitting Reduction of Text Classification Based on AdaBELM

Feng

Shi

et al. 2017

Entropy

View full text Add to dashboard Cite

Abstract:Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.

show abstract

Untitled

Cited by 1,725 publications

References 27 publications

Chinese Short Text Categorization Based on Semi-Supervised Learning

Chinese Short Text Categorization Based on Semi-Supervised Learning

Extreme Learning Machine for Multi-Label Classification

Overfitting Reduction of Text Classification Based on AdaBELM

Contact Info

Product

Resources

About