2014
DOI: 10.1016/j.ins.2013.08.059
|View full text |Cite
|
Sign up to set email alerts
|

Towards UCI+: A mindful repository design

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
44
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 56 publications
(46 citation statements)
references
References 22 publications
1
44
0
1
Order By: Relevance
“…The repository is publicly available and is regularly used in machine learning research. The usage procedure, which is referred as "The UCI test" [118] or the "de facto approach" [44] [96], follows the general form of equation 1 where M is the repository, p is the choice of datasets and R is one particular performance metric (accuracy, AUC, Brier score, F-measure, MSE, etc. [53,76]).…”
Section: Pedestrian Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…The repository is publicly available and is regularly used in machine learning research. The usage procedure, which is referred as "The UCI test" [118] or the "de facto approach" [44] [96], follows the general form of equation 1 where M is the repository, p is the choice of datasets and R is one particular performance metric (accuracy, AUC, Brier score, F-measure, MSE, etc. [53,76]).…”
Section: Pedestrian Benchmarksmentioning
confidence: 99%
“…It is a distortion-based generator (similar to Soares's UCI++ [153]). Finally, [118] suggest ideas about sharing and arranging the results of previous evaluations so that each new algorithm can be compared immediately with many other algorithms using the same experimental setting. This idea of 'experiment database' [168] has already been set up.…”
Section: Pedestrian Benchmarksmentioning
confidence: 99%
“…As most of them are real datasets, it is not possible to assert that they are noiseless, although some of them are artificial and show no label inconsistency. Nonetheless, a recent study showed that most of the datasets from UCI can be regarded as easy problems, once many classification techniques are able to attain high accuracies when applied to them [32]. Table 2 summarizes the main characteristics of these datasets: number of examples (Examples), number of features (Features), number of classes (Class) and percentage of the examples in the majority class (%MC).…”
Section: Datasetsmentioning
confidence: 99%
“…Therefore, the noise rates artificially added could not match to the rate of noise present in the data. The predictive performance of a classifier for a particular dataset is often associated with the difficulty of the classification problem represented by this dataset (Lorena et al, 2012;Maciá & Bernadó-Mansilla, 2014). It is intuitive that for easy classification problems it is also easy to obtain a plausible and highly accurate classification hypothesis, while the opposite is verified for difficult problems.…”
Section: Methodsmentioning
confidence: 99%
“…Because they are real, it is not possible to assert that they are noisefree, although some of them are artificial and show no label inconsistencies. Nonetheless, a recent study showed that most of the datasets from UCI can be considered easy problems, once many classification techniques are able to obtain high predictive accuracies when applied to them (Maciá & Bernadó-Mansilla, 2014). Table 2.2 summarizes the main characteristics of the datasets used in the experiments of this Thesis: number of examples (#EX), number of features (#FT), number of classes (#CL) and percentage of the examples in the majority class (%MC).…”
Section: Datasetsmentioning
confidence: 99%