A fast algorithm for feature selection in conditional maximum entropy modeling

Zhou, Yiming; Wu, Lide; Weng, Fuliang; Schmidt, Holly

doi:10.3115/1119355.1119375

Cited by 17 publications

(24 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many other feature selection methods have been proposed both for general settings (see, e.g., Yang & Pedersen, 1997, for a comparative study of these methods for text categorization) and for ME estimation (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;Shirai et al, 1998;McCallum, 2003;Zhou et al, 2003). They basically order and omit (or add) features, just by observing measures for the predictive power of features such as information gain, χ 2 -test values, and gain in likelihood (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;McCallum, 2003;Zhou et al, 2003).…”

Section: Problem and Existing Solutionsmentioning

confidence: 99%

“…They basically order and omit (or add) features, just by observing measures for the predictive power of features such as information gain, χ 2 -test values, and gain in likelihood (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;McCallum, 2003;Zhou et al, 2003). The common problem with these methods is that the ordering is based on a heuristic criterion and ignores the fact that uncertainty is already contained in such measures.…”

Section: Problem and Existing Solutionsmentioning

confidence: 99%

“…Since the goal is to maximize generalization performance with the model employed or estimation algorithm, the interaction between them should be taken into account. Although the feature selection methods for ME models (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;Shirai et al, 1998;McCallum 2003;Zhou et al, 2003) have interaction with the estimation algorithm, they abandon giving a complete account of feature interaction by resorting to approximation to avoid the expensive calculation of measures such as gain in likelihood.…”

Section: Problem and Existing Solutionsmentioning

confidence: 99%

See 2 more Smart Citations

Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Kazama

Tsujii

2005

Mach Learn

View full text Add to dashboard Cite

Abstract. Data sparseness or overfitting is a serious problem in natural language processing employing machine learning methods. This is still true even for the maximum entropy (ME) method, whose flexible modeling capability has alleviated data sparseness more successfully than the other probabilistic models in many NLP tasks. Although we usually estimate the model so that it completely satisfies the equality constraints on feature expectations with the ME method, complete satisfaction leads to undesirable overfitting, especially for sparse features, since the constraints derived from a limited amount of training data are always uncertain. To control overfitting in ME estimation, we propose the use of box-type inequality constraints, where equality can be violated up to certain predefined levels that reflect this uncertainty. The derived models, inequality ME models, in effect have regularized estimation with L 1 norm penalties of bounded parameters. Most importantly, this regularized estimation enables the model parameters to become sparse. This can be thought of as automatic feature selection, which is expected to improve generalization performance further. We evaluate the inequality ME models on text categorization datasets, and demonstrate their advantages over standard ME estimation, similarly motivated Gaussian MAP estimation of ME models, and support vector machines (SVMs), which are one of the state-of-the-art methods for text categorization.

show abstract

Section: Problem and Existing Solutionsmentioning

confidence: 99%

Section: Problem and Existing Solutionsmentioning

confidence: 99%

Section: Problem and Existing Solutionsmentioning

confidence: 99%

See 1 more Smart Citation

Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Kazama

Tsujii

2005

Mach Learn

View full text Add to dashboard Cite

show abstract

“…In both cases, the computational requirements for scoring large sets of candidate features are prohibitive. Zhou et al introduced a modification to Berger et al's algorithm to reduce the computational requirements of feature selection [118]. They note that the feature gain estimates are relatively constant between iterations.…”

Section: Forward Selectionmentioning

confidence: 99%

“…Further work can be done to evaluate a wider collection of scoring metrics drawn from the feature selection literature. A straight forward experiment that we could try immediately would be to use the pruning method of [118], discussed in related work, to reduce the number of candidate features that we evaluate each iteration during forward selection.…”

Section: Future Directionsmentioning

confidence: 99%

Conditional random fields for activity recognition

Vail

Veloso

Lafferty

2007

Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems

263

136

View full text Add to dashboard Cite

To act intelligently in the presence of others, robots must use information from their sensors to recognize the behaviors and activities of the other agents in their environment. Robots must map from low-level, difficult to interpret data, such as position information extracted from video, to abstract states, in particular, the activities of the other agents. In this thesis, we explore how to bridge the gap from noisy, continuous observations about the world to high-level, discrete activity labels for robots in the environment.We contribute the use of conditional random fields (CRFs) for activity recognition in multirobot domains. We explore the appropriateness of CRFs with an empirical comparison to hidden Markov models. We elucidate the properties of CRFs that make them well suited to the activity recognition, namely discriminative training, the ability to robustly incorporate rich features of the observations, and their nature as conditional models, with a variety of synthetic and real robot data.Accurate activity recognition requires complex and rich features of the observations. We choose the most informative features from a large set of candidates using feature selection. We adapt two feature selection algorithms, grafting and 1 regularization, to conditional random fields. We also investigate a third feature selection algorithm, which was originally proposed for CRFs in a natural language processing domain, in an activity recognition context. In particular, we focus on scaling feature selection to very large sets of candidate features that we define succinctly using a rich relational feature specification language.The reduced feature sets that we discover via feature selection enable efficient, real-time inference. However, feature selection and training for conditional random fields is computationally expensive. We adapt an M-estimator, introduced by Jeon and Lin for log-density estimation in ANOVA models, for fast, approximate parameter estimation in CRFs. We provided an in depth, empirical evaluation of the properties of the M-estimator and then we introduce a new, efficient feature selection algorithm for CRFs based around M-estimation to identify the most important features.

show abstract

Filter-Based Feature Selection Using Two Criterion Functions and Evolutionary Fuzzification

Sornil

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A fast algorithm for feature selection in conditional maximum entropy modeling

Cited by 17 publications

References 5 publications

Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization

Conditional random fields for activity recognition

Filter-Based Feature Selection Using Two Criterion Functions and Evolutionary Fuzzification

Contact Info

Product

Resources

About