From Big Data to Smart Data with the K-Nearest Neighbours Algorithm

Triguero, Isaac; Maillo, Jesús; Luengo, Julián; García, Salvador; Herrera, Francisco

doi:10.1109/ithings-greencom-cpscom-smartdata.2016.177

Cited by 15 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Although most of these data preprocessing techniques were motivated by k‐NN drawbacks, it turns out that the resulting “smart” dataset provided by the above approaches can also be of use in many other learning algorithms Cano, Herrera, and Lozano (); Luengo et al (). This work reviews the current specialized literature that revolves around the idea of the k‐NN to come up with Smart Data, greatly extending our preliminary contribution in Triguero, Maillo, Luengo, García, and Herrera () around this topic. First, we will deepen into the concepts of big and Smart Data and how to extract value from Big Data with existing technologies and Big Data preprocessing techniques (Section 2).…”

Section: Introductionmentioning

confidence: 87%

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Triguero

García-Gil

Maillo

et al. 2018

WIREs Data Min & Knowl

Self Cite

135

View full text Add to dashboard Cite

The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data—likely to contain noise and imperfections—are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k‐nearest neighbors rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data—which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context are investigated. This includes a brief overview of Smart Data, current and future trends for the k‐nearest neighbor algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data‐ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k‐nearest neighbor algorithm to obtain Smart/Quality Data for a high‐quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analyzed. This article is categorized under: Technologies > Data Preprocessing Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Classification

show abstract

Section: Introductionmentioning

confidence: 87%

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Triguero

García-Gil

Maillo

et al. 2018

WIREs Data Min & Knowl

Self Cite

135

View full text Add to dashboard Cite

show abstract

“…This method is widely applied for data classification and pattern recognition [26]. KNN is a high cost technique for have to find all the distances between data [27] and K the centroid to find out the closest distance that represents the similarity of the data to a certain group.…”

Section: ) Association Rulementioning

confidence: 99%

A Survey of Data Mining Techniques for Smart Museum Applications

Puspasari

Ermatita

2021

Jurnal Informatika

View full text Add to dashboard Cite

This research aims to find out what data mining techniques are effectively implemented in museums and what application trends are currently being used to improve museum performance towards modern museums based on intelligent system technology. The review was carried out on a number of articles found in journals and proceedings in the 2004-2020 period. It is found that the majority of data mining techniques are implemented in museum virtual guide applications, recommender systems, collection clustering and classification system, and visitor behaviour prediction application. Data classification, clustering, and prediction technique commonly used for museum application. Collections with historical and artistic value contain a lot of knowledge making data mining an important technique to be included in various applications in museums so that they can have an impact on the achievement of museum goals not only in the fields of education and culture but also economics and business.

show abstract

“…Omenjeni proces predstavlja eno izmed pomembnih faz podatkovnega rudarjenja, namenjeno čiščenju in korigiranju izvornih podatkov z namenom učinkovitejšega apliciranja algoritmov strojnega učenja (npr. evolucijskih algoritmov, globokega učenja, regresijske analize) [23]. Med opravila predpriprave podatkov štejemo čiščenje (angl.…”

Section: Strojno Učenjeunclassified

Sodobne informacijske tehnologije in storitve (Zbornik dvaindvajsete konference)

Heričko¹,

Kous²

2017

View full text Add to dashboard Cite

v MariboruVse pravice pridržane. Brez pisnega dovoljenja založnika je prepovedano reproduciranje, distribuiranje, predelava ali druga uporaba tega dela ali njegovih delov v kakršnemkoli obsegu ali postopku, vključno s fotokopiranjem, tiskanjem ali shranjevanjem v elektronski obliki. Naslov:OTS 2017 Sodobne informacijske tehnologije in storitve (Zbornik dvaindvajsete konference) (Maribor, 13. in 14. junij 2017) Urednika: prof. dr. Marjan Heričko (Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko) Katja Kous (Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko)

show abstract

From Big Data to Smart Data with the K-Nearest Neighbours Algorithm

Cited by 15 publications

References 23 publications

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

A Survey of Data Mining Techniques for Smart Museum Applications

Sodobne informacijske tehnologije in storitve (Zbornik dvaindvajsete konference)

Contact Info

Product

Resources

About