2022
DOI: 10.48550/arxiv.2202.03238
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards an Analytical Definition of Sufficient Data

Abstract: We show that, for each of five datasets of increasing complexity, certain training samples are more informative of class membership than others. These samples can be identified a priori to training by analyzing their position in reduced dimensional space relative to the classes' centroids. Specifically, we demonstrate that samples nearer the classes' centroids are less informative than those that are furthest from it. For all five datasets, we show that there is no statistically significant difference between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…In this work, we will be defining a density measurement that can be applied to the 3-dimensional reduction of each class in a dataset independently and then providing a study of classification accuracy after reducing the data in each class to a variety of target densities as opposed to reducing each class by the same amount as in [9].…”
mentioning
confidence: 99%
“…In this work, we will be defining a density measurement that can be applied to the 3-dimensional reduction of each class in a dataset independently and then providing a study of classification accuracy after reducing the data in each class to a variety of target densities as opposed to reducing each class by the same amount as in [9].…”
mentioning
confidence: 99%