2017
DOI: 10.1002/wics.1414
|View full text |Cite
|
Sign up to set email alerts
|

Recent advances in scaling‐down sampling methods in machine learning

Abstract: Data sampling methods have been investigated for decades in the context of machine learning and statistical algorithms, with significant progress made in the past few years driven by strong interest in big data and distributed computing. Most recently, progress has been made in methods that can be broadly categorized into random sampling including density-biased and nonuniform sampling methods; active learning methods, which are a type of semi-supervised learning and an area of intense research; and progressiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(13 citation statements)
references
References 122 publications
(130 reference statements)
0
13
0
Order By: Relevance
“…As it has been observed in the review of [18], data sampling methods for machine learning have been investigated for decades. According to the above paper, in recent years progress has been made in methods that can be broadly categorized into random sampling including density-biased and nonuniform sampling methods, active learning methods, which are the type of semisupervised learning, and progressive sampling methods, which can be viewed as a combination of the above two approaches.…”
Section: Related Workmentioning
confidence: 99%
“…As it has been observed in the review of [18], data sampling methods for machine learning have been investigated for decades. According to the above paper, in recent years progress has been made in methods that can be broadly categorized into random sampling including density-biased and nonuniform sampling methods, active learning methods, which are the type of semisupervised learning, and progressive sampling methods, which can be viewed as a combination of the above two approaches.…”
Section: Related Workmentioning
confidence: 99%
“…In the case of optimization, the experiment can be conducted using a selected optimization algorithm, which has been a mature field. On the other hand, in the case of constructing an accurate model for advanced process development or predictive maintenance, data-efficient sampling in semiconductor manufacturing has been less studied compared to optimization, while sampling has been an intensively studied subject in other fields. Prior works in data-efficient sampling in the semiconductor process are mainly in yield improvement, quality control, and predictive maintenance in a production-line setting. On the other hand, the sampling strategies in developing key process steps in advanced technology nodes have not been studied much, and we only found limited literature. , …”
Section: Introductionmentioning
confidence: 99%
“…Thus, an active learning algorithm manages to gain examples that can immediately minimize the number of hypotheses, and the two most widely used methods are query by disagreement and query by committee. , In the variance reduction method, the active data selection is used to minimize the predictive variance of the active learning algorithm depending on Fisher information. Nevertheless, it will face a detriment that the computational complexity makes it impractical since a large number of parameters are present. , Hence, it has received little interest in actual applications of efficient sampling in ML problems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Variance Reduction Technique (VRT) is proposed as the most effective approach to improve sampling efficiency [13,14]. Although VRT increases sampling efficiency, it does not use the minimum sample size for UA [15]. Progressive Sampling Techniques (PST) solves the problem of finding the least possible samples.…”
Section: Introductionmentioning
confidence: 99%