Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing - 2003
DOI: 10.3115/1119355.1119375
|View full text |Cite
|
Sign up to set email alerts
|

A fast algorithm for feature selection in conditional maximum entropy modeling

Abstract: This paper describes a fast algorithm that selects features for conditional maximum entropy modeling. Berger et al. (1996) presents an incremental feature selection (IFS) algorithm, which computes the approximate gains for all candidate features at each selection stage, and is very time-consuming for any problems with large feature spaces. In this new algorithm, instead, we only compute the approximate gains for the top-ranked features based on the models obtained from previous stages. Experiments on WSJ data … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
23
0

Year Published

2005
2005
2016
2016

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 5 publications
1
23
0
Order By: Relevance
“…Many other feature selection methods have been proposed both for general settings (see, e.g., Yang & Pedersen, 1997, for a comparative study of these methods for text categorization) and for ME estimation (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;Shirai et al, 1998;McCallum, 2003;Zhou et al, 2003). They basically order and omit (or add) features, just by observing measures for the predictive power of features such as information gain, χ 2 -test values, and gain in likelihood (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;McCallum, 2003;Zhou et al, 2003).…”
Section: Problem and Existing Solutionsmentioning
confidence: 99%
See 2 more Smart Citations
“…Many other feature selection methods have been proposed both for general settings (see, e.g., Yang & Pedersen, 1997, for a comparative study of these methods for text categorization) and for ME estimation (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;Shirai et al, 1998;McCallum, 2003;Zhou et al, 2003). They basically order and omit (or add) features, just by observing measures for the predictive power of features such as information gain, χ 2 -test values, and gain in likelihood (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;McCallum, 2003;Zhou et al, 2003).…”
Section: Problem and Existing Solutionsmentioning
confidence: 99%
“…They basically order and omit (or add) features, just by observing measures for the predictive power of features such as information gain, χ 2 -test values, and gain in likelihood (Berger, Della Pietra, & Della Pietra, 1996;Della Pietra, Della Pietra, & Lafferty, 1997;McCallum, 2003;Zhou et al, 2003). The common problem with these methods is that the ordering is based on a heuristic criterion and ignores the fact that uncertainty is already contained in such measures.…”
Section: Problem and Existing Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…In both cases, the computational requirements for scoring large sets of candidate features are prohibitive. Zhou et al introduced a modification to Berger et al's algorithm to reduce the computational requirements of feature selection [118]. They note that the feature gain estimates are relatively constant between iterations.…”
Section: Forward Selectionmentioning
confidence: 99%
“…Further work can be done to evaluate a wider collection of scoring metrics drawn from the feature selection literature. A straight forward experiment that we could try immediately would be to use the pruning method of [118], discussed in related work, to reduce the number of candidate features that we evaluate each iteration during forward selection.…”
Section: Future Directionsmentioning
confidence: 99%