2019
DOI: 10.3906/elk-1807-212
|View full text |Cite
|
Sign up to set email alerts
|

Optimal training and test sets design for machine learning

Abstract: In this paper, we describe histogram matching, a metric for measuring the distance of two datasets with exactly the same features, and embed it into a mixed integer programming formulation to partition a dataset into fixed size training and test subsets. The partition is done such that the pairwise distances between the dataset and the subsets are minimized with respect to histogram matching. We then conduct a numerical study using a well-known machine learning dataset. We demonstrate that the training set con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…Finally, each of the five categories was made up of 1349 images, for a total of 6745 images. Of these, 80% were randomly selected for training, 10% for validation and 10% for testing [16].…”
Section: Methodsmentioning
confidence: 99%
“…Finally, each of the five categories was made up of 1349 images, for a total of 6745 images. Of these, 80% were randomly selected for training, 10% for validation and 10% for testing [16].…”
Section: Methodsmentioning
confidence: 99%
“…To estimate the model, the data set can be divided into training and test data in ratios such as 1:1, 2:1 70:30, 60:40 [57], 66:34 [18] according to the user's purpose. Here, it is generally preferred that the training set consists of as much data as possible in order to obtain a stronger model [18,57,58]. A certain amount of the data set (20% -30%) is kept for testing data, which is called the storage procedure, and then the remaining amount can be used for training.…”
Section: Evaluation Of Datamentioning
confidence: 99%
“…Machine learning (ML) depends on computational statistics, the main idea of ML is making predictions using computers. Machine learning algorithms create a mathematical predictive model that depends on a sample of data, known as "training dataset" [1]. Also, predictions or decisions made without explicit programming is another benefit of machine learning.…”
Section: Introductionmentioning
confidence: 99%