1995
DOI: 10.1145/212094.212114
|View full text |Cite
|
Sign up to set email alerts
|

Overfitting and undercomputing in machine learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
291
0
10

Year Published

2000
2000
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 599 publications
(302 citation statements)
references
References 1 publication
1
291
0
10
Order By: Relevance
“…A model that over-fits the data is one that has been influenced by random error and noise in the training data to the extent that it does not accurately reflect the underlying phenomenon being studied 68 . Overfitting is a particular concern in models where the number of dimensions greatly outweighs the sample size and in highly complex models.…”
Section: Accepted M Manuscriptmentioning
confidence: 99%
“…A model that over-fits the data is one that has been influenced by random error and noise in the training data to the extent that it does not accurately reflect the underlying phenomenon being studied 68 . Overfitting is a particular concern in models where the number of dimensions greatly outweighs the sample size and in highly complex models.…”
Section: Accepted M Manuscriptmentioning
confidence: 99%
“…Before using the big data for analysis of the system's state and real-time fault detection, pre-processing, which requires the replacement of missing values, removal of incomplete rows and columns, outliers and extreme values, was done. This process of data cleaning can also involve data integration, transformation, reduction and discretization, to make the analysis fast and prevent bogus results [37]. Hence, redundant input variables such as those that were constant were removed, and missing values were replaced with zeros and by averaging the nearest neighbors' values of the missing value cells.…”
Section: Fault Detection and Identification With Annmentioning
confidence: 99%
“…Thus, in the applications for high-dimensional datasets filtering methods are often combined with the simplest or fastest learners. This is due to the fact that learning parameters usually involved in choosing complex learners may make the selection process infeasible or may result in overfitting [33]. Consequently, most frequently used models have historically been naive Bayes [34], K-nearest neighbor models or decision trees, typically C4.5 [35].…”
Section: Classification Models In Evaluating Feature Selection Algorimentioning
confidence: 99%