2021
DOI: 10.3390/app11094132
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning Methods with Noisy, Incomplete or Small Datasets

Abstract: In this article, we present a collection of fifteen novel contributions on machine learning methods with low-quality or imperfect datasets, which were accepted for publication in the special issue “Machine Learning Methods with Noisy, Incomplete or Small Datasets”, Applied Sciences (ISSN 2076-3417). These papers provide a variety of novel approaches to real-world machine learning problems where available datasets suffer from imperfections such as missing values, noise or artefacts. Contributions in applied sci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 15 publications
0
12
0
Order By: Relevance
“…The creating and assessing process of these models consists of feature selection, optimization of model parameters using the train data, and evaluation of the model using the test data, in the training and testing phases, respectively. Also, synthetic minority oversampling technique 23 is applied for balancing the train data. This process divides the data set into 10 nonoverlapping folds.…”
Section: Methodsmentioning
confidence: 99%
“…The creating and assessing process of these models consists of feature selection, optimization of model parameters using the train data, and evaluation of the model using the test data, in the training and testing phases, respectively. Also, synthetic minority oversampling technique 23 is applied for balancing the train data. This process divides the data set into 10 nonoverlapping folds.…”
Section: Methodsmentioning
confidence: 99%
“…Despite the growing pervasiveness level of big data, there are still challenges to accessing a high-quality training set. Data sharing agreements, violation of privacy [584], [585], noise problem [586], [587], poor data quality(fit for purpose) [588], imbalance of data [589], and lack of annotated datasets are number of challenges businesses face seeking raw data. Oversampling, undersampling, dynamic sampling [590] for imbalanced data, Surrogate Loss, Data Cleaning, finding distribution in solving the problem of learning from noisy labels for noisy data sets, and active learning [591] for lack of annotated data are a number of methods have been proposed to alleviate these problems.…”
Section: ) Scalabilitymentioning
confidence: 99%
“…A small dataset can sustain issues in supervised learning scenarios. This kind of dataset is termed a low-quality data problem [49]. Due to the lack of open and comprehensive solid waste generation data, we utilized the abovementioned data set.…”
Section: Solid Waste Datasetmentioning
confidence: 99%