High-dimensional data analysis has become the most challenging task nowadays. Dimensionality reduction plays an important role here. It focuses on data features, which have proved their impact on accuracy, execution time, and space requirement. In this study, a dimensionality reduction method is proposed based on the convolution of input features. The experiments are carried out on minimal preprocessed nine benchmark datasets. Results show that the proposed method gives an average 38% feature reduction in the original dimensions. The algorithm accuracy is tested using the decision tree (DT), support vector machine (SVM), and K-nearest neighbor (KNN) classifiers and evaluated with the existing principal component analysis algorithm. The average increase in accuracy (Δ) is 8.06 for DT, 5.80 for SVM, and 18.80 for the KNN algorithm. The most significant characteristic feature of the proposed model is that it reduces attributes, leading to less computation time without loss in classifier accuracy.
Data analytics is a very common word today. Data is collected from various sources and analyzed for decision making. The decisions help for growing business, for healthcare support, as well as to keep track of some useful information on communication media. For the same data may be shared, stored, and analyzed. Each of these three processes involves threat of data leakage to hacker. To prevent this, privacy preservation algorithms are used. This chapter discusses about privacy preserving techniques right from data collection to analytics through data storage. The data classification techniques are also discussed for understanding of machine learning data analytics. At the end open issues in privacy preserving are also discussed.
Privacy is the main concern in cyberspace because, every single click of a user on Internet is recognized and analyzed for different purposes like credit card purchase records, healthcare records, business, personalized shopping store experience to the user, deciding marketing strategy, and the list goes on. Here, the user’s personal information is considered a risk process. Though data mining applications focus on statistically useful patterns and not on the personal data of individuals, there is a threat of unrestricted access to individual records. Also, it is necessary to maintain the secrecy of data while retaining the accuracy of data classification and quality as well. For real-time applications, the data analytics carried out should be time efficient. Here, the proposed Convolution-based Privacy Preserving Algorithm (C-PPA) transforms the input into lower dimensions while preserving privacy which leads to better mining accuracy. The proposed algorithm is evaluated over different privacy-preserving metrics like accuracy, precision, recall, and F1-measure. Simulations carried out show that the average increment in the accuracy of C-PPA is 14.15 for Convolutional Neural Network (CNN) classifier when compared with results without C-PPA. Overlap-add C-PPA is proposed for parallel processing which is based on overlap-add convolution. It shows an average accuracy increment of 12.49 for CNN. The analytics show that the algorithm benefits regarding privacy preservation, data utility, and performance. Since the algorithm works on lowering the dimensions of data, the communication cost over the Internet is also reduced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.