1997
DOI: 10.1016/s0004-3702(97)00063-5
|View full text |Cite
|
Sign up to set email alerts
|

Selection of relevant features and examples in machine learning

Abstract: In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two k ey issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare di erent methods. We close with some challenges for future work in this area.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
1,367
1
39

Year Published

1999
1999
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 2,649 publications
(1,409 citation statements)
references
References 67 publications
2
1,367
1
39
Order By: Relevance
“…However, such search is exponential in the number of radio sources and therefore intractable. Instead, we used a greedy feature selection technique [4] to select a subset of highly relevant radio sources to be used in the Euclidean distance calculation. This greedy technique, albeit not optimal, has been shown to work well in practice [4].…”
Section: Localization Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, such search is exponential in the number of radio sources and therefore intractable. Instead, we used a greedy feature selection technique [4] to select a subset of highly relevant radio sources to be used in the Euclidean distance calculation. This greedy technique, albeit not optimal, has been shown to work well in practice [4].…”
Section: Localization Algorithmsmentioning
confidence: 99%
“…Table 3 summarizes the number of fingerprints collected per floor for each of the buildings. 4 The different number of fingerprints collected per floor is the result of us increasing the number of training and testing fingerprints collected with every new building in the hope of achieving even better localization results. Ironically, as we show in Section 6.3.2, the number of training fingerprints has little bearing on the localization accuracy.…”
Section: Data Collectionmentioning
confidence: 99%
“…In addition, those selected features are very important, since they can provide the novel biological knowledge and insights for biologists to further investigate how they are related to the disease phenotypes. Clearly, there are some standard feature selection techniques (Blum and Langley, 1997;Kohavi and John, 1997;Guyon and Elisseeff, 2003) and classification techniques which can automatically select important features from a large amount of input features. For example, some simple and well-known filter-based feature selection methods select features based on the relationship between two random variables.…”
Section: Introductionmentioning
confidence: 99%
“…Most of the feature selection algorithms approach the task as a search problem, where each state in the search specifies a distinct subset of the possible attributes (Blum and Langley, 1997). The search procedure is combined with a criterion in order to evaluate the merit of each candidate subset of attributes.…”
Section: Introductionmentioning
confidence: 99%
“…It searches for features better suited to the mining algorithm, aiming to improve mining performance, but it also is more computationally expensive (Langley, 1994;Kohavi and John, 1997) than filter models. Feature ranking (FR), also called feature weighting (Blum and Langley, 1997;Guyon and Elisseeff, 2003), assesses individual features and assigns them weights according to their degrees of relevance, while the feature subset selection (FSS) evaluates the goodness of each found feature subset. (Unusually, some search strategies in combination with subset evaluation can provide a ranked list).…”
Section: Introductionmentioning
confidence: 99%