2005
DOI: 10.1007/s10994-005-0911-3
|View full text |Cite
|
Sign up to set email alerts
|

Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Abstract: Abstract. Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable ove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0
1

Year Published

2008
2008
2016
2016

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(39 citation statements)
references
References 22 publications
0
38
0
1
Order By: Relevance
“…The MME for social emotion classification can be extended to a generalized model with relaxing constraints and the L 2 norm penalty [13], which has also been employed in the soft-margin extension for support vector machines to improve the predictive power. The generalized model gMME is formulated as the following optimization problem:…”
Section: Model Descriptionmentioning
confidence: 99%
See 1 more Smart Citation
“…The MME for social emotion classification can be extended to a generalized model with relaxing constraints and the L 2 norm penalty [13], which has also been employed in the soft-margin extension for support vector machines to improve the predictive power. The generalized model gMME is formulated as the following optimization problem:…”
Section: Model Descriptionmentioning
confidence: 99%
“…Then, the connections between words and social emotions are estimated by the principle of maximum entropy (ME), whose flexible modeling and efficient learning capabilities have alleviated data sparseness more successfully than other probabilistic models [13]. However, the data sparseness problem can not be solved completely, even with the ME method based on all user ratings over full emotion labels (referred as multi-label maximum entropy, MME).…”
Section: Introductionmentioning
confidence: 99%
“…(2) was set to ω/|D| (referred to as 'single width' in Ref. [22]). ); we omit the hyper-parameters when clear from contexts.…”
Section: Settingsmentioning
confidence: 99%
“…1 -regularized log-linear models ( 1 -llms) provide sparse solutions, in which weights of irrelevant features are exactly zero as a result of assuming a Laplacian prior on the weights [46], [49]. However, Kazama and Tsujii [22] have reported in a text categorization task that most features regarded as irrelevant during the training of 1 -llms appeared rarely in the task. In such a case, 1 -1 Institute of Industrial Science, the University of Tokyo a) ynaga@tkl.iis.u-tokyo.ac.jp regularization cannot greatly reduce the number of active features in each classification, while retaining the classification accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Text categorization is a crucial and well-proven instrument to deal with and organize large numbers of textual information. In recent years, there have been extensive study and rapid progress in automatic text categorization, including Naïve Bayes [1], Decision Trees [2], K-nearest neighbor [3], maximum entropy models [4,5] and fuzzy theory based approaches [6]. Support vector machine (SVM) is very popular and proved to be one of the best algorithms for text categorization [7,8].…”
Section: Introductionmentioning
confidence: 99%