A study of the effect of different types of noise on the precision of supervised learning techniques

Nettleton, David; Orriols-Puig, Albert; Fornells, Albert

doi:10.1007/s10462-010-9156-z

Cited by 369 publications

(208 citation statements)

References 18 publications

Supporting

Mentioning

166

Contrasting

Unclassified

Order By: Relevance

“…Because there are numerous practical challenges, they cannot be simply treated as black-box (simply enter the input features). Some of the difficulties are: large dimensionality of feature vectors [50], bias/variance dilemma [51], input and output noise [52], large-scale training data [53], data heterogeneity [54], data redundancy [55] and non-linearity among features [56].…”

Section: Supervised Learningmentioning

confidence: 99%

Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications

et al. 2017

View full text Add to dashboard Cite

Abstract:To promote sustainable development, the smart city implies a global vision that merges artificial intelligence, big data, decision making, information and communication technology (ICT), and the internet-of-things (IoT). The ageing issue is an aspect that researchers, companies and government should devote efforts in developing smart healthcare innovative technology and applications. In this paper, the topic of disease diagnosis in smart healthcare is reviewed. Typical emerging optimization algorithms and machine learning algorithms are summarized. Evolutionary optimization, stochastic optimization and combinatorial optimization are covered. Owning to the fact that there are plenty of applications in healthcare, four applications in the field of diseases diagnosis (which also list in the top 10 causes of global death in 2015), namely cardiovascular diseases, diabetes mellitus, Alzheimer's disease and other forms of dementia, and tuberculosis, are considered. In addition, challenges in the deployment of disease diagnosis in healthcare have been discussed.

show abstract

Section: Supervised Learningmentioning

confidence: 99%

Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications

et al. 2017

View full text Add to dashboard Cite

show abstract

“…In most applications, each parameter (or variable) will represent a column of the data matrix and the timestamp will be an additional variable. Sometimes, some pre-processing is required to integrate the information of several sensors in a single indicator (like mean water level of a bioreactor for example, measured by several sensors in different points) and the imprecision or uncertainty associated to the measurements must be properly treated at pre-processing level, in particular the noise associated to the signals [168]. Regarding imprecision, georadar, or GPS systems, for example, provide a region where the target can be located with a certain probability, but are not able to provide exact positions of the target instances.…”

Section: Building the Original Data Matrixmentioning

confidence: 99%

A survey on pre-processing techniques: Relevant issues in the context of environmental data mining

Gibert

Sànchez–Marrè

Izquierdo

2016

AIC

View full text Add to dashboard Cite

One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.

show abstract

“…Domingos et al [32] found that NB performance is competitive with more sophisticated ML methods, such as DT, IBL, and rule induction, even if the features' dependency is very strong. Moreover, NB is a strongly noise-tolerant algorithm [33], [34]. Nettleton et al [33] performed a systematic analysis of robustness of many ML algorithms to noise-namely, NB, C4.5, IBk, and SMO.…”

Section: B Naїve Bayesian (Nb)mentioning

confidence: 99%

Conservative Noise Filters

Jamjoom¹,

Hindi²

2016

ijacsa

View full text Add to dashboard Cite

Abstract-Noisy training data have a huge negative impact on machine learning algorithms. Noise-filtering algorithms have been proposed to eliminate such noisy instances. In this work, we empirically show that the most popular noise-filtering algorithms have a large False Positive (FP) error rate. In other words, these noise filters mistakenly identify genuine instances as outliers and eliminate them. Therefore, we propose more conservative outlier identification criteria that improve the FP error rate and, thus, the performance of the noise filters. With the new filter, an instance is eliminated if and only if it is misclassified by a mutual decision of Naïve Bayesian (NB) classifier and the original filtering criteria being used. The number of genuine instances that are incorrectly eliminated is reduced as a result, thereby improving the classification accuracy.

show abstract

A study of the effect of different types of noise on the precision of supervised learning techniques

Cited by 369 publications

References 18 publications

Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications

Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications

A survey on pre-processing techniques: Relevant issues in the context of environmental data mining

Conservative Noise Filters

Contact Info

Product

Resources

About