The quality of a movie can be known from the opinions or reviews of previous audiences. This classification of reviews is grouped into positive opinions and negative opinions. One of the data mining algorithms that are most frequently used in research is the Support Vector Machine because it works well as a method of classifying text but has a very sensitive deficiency in the selection of features. The Information Gain method as feature selection can solve problems faster and more stable convergence levels. After testing on two movie review datasets are Cornell and Stanford datasets. The results obtained on the Cornell dataset is the Support Vector Machine algorithm to produce an accuracy of 83.05%, while for the Support Vector Machine based on Information Gain, the accuracy value is 85.65%. Increased accuracy reached 2.6%. Then, the results obtained on the Stanford dataset is the Support Vector Machine algorithm yields a value of 86.46%, while for the Support Vector Machine based on Information Gain, the accuracy value is 86.62%. Increased accuracy reached 0.166%. Support Vector Machine based Information Gain on the problem of movie review sentiment analysis proved to provide more accurate value.
The current development of technology is quite rapidly not disengaged in a large data processor covering of all areas such as information technology, computer science, medicine, finance and other. This brings a large computing effect in identifying the processing of data. In data analysis for very large data, data processing is very much needed, in this study the authors propose data mining method as a solution to a large data processing problem, data mining is divided into several techniques including classification method techniques that aims to classify large amounts of data to be relevant data information. In this study the authors compared 5 algorithms in the classification method to get better performance in classification problems. Researchers analyze and test 5 Algorithm classifications with 4 different datasets as a tool in the problem of large data classification. The results of this research show the method SVM is much better to be used 4 comparison methods in calculating the value of AUC by using 4 datasets of UCI Repository. The LSVT Dataset shows the highest AUC value with 0973, Ionsphere 0887, Sonar 0897, Heartstatlog 0868.
The intrusion detection system is an important component that performs the analysis for. the problem arising from the IDS is a collection of data sets in a computer network. to increase the high level and low false positive level of approach with the learning machine in applied. The data mining algorithm used is Naïve bayes one of the most widely used algorithms in space due to its simplicity, efficiency and effectiveness. NB has high accuracy and speed when applied into the database with large data. However, the NB algorithm assumes independent attributes (free) and is very sensitive to the selection of many features that interfere with the performance or accuracy of the NB to be low but in practice, the possibilities of the feature are interrelated. The Feature Dependent Naïve Bayes (FDNB) method is an effective method used to solve existing problems in NB by computing features as pairs and creating dependencies between each other as well as by applying learning models implemented to cross-validation, Feature Selection and data steps preprocessing that gives better accuracy results. After testing with two models of Naïve bayes and FDNB, the results obtained from the Naïve Bayes algorithm resulted in an accuracy of 84.42%, while for FDNB and oversampling (CFS + GS) the accuracy was 94.58%, FDNB and oversampling (CFS + BFS) the accuracy value of 94.69%, FDNB and SMOTE (CFS + GS) and FDNB and SMOTE (CFS + BFS) has an accuracy value of 93.27%. For the average per attack type DOS attack shows the highest result for its accuracy value of 97.86% and U2R attack produces the best accuracy when classifying U2R with 93.80% accuracy, U-F size of 96.26% U2R can be considered as a very result nice. Because U2R attack is considered very dangerous.
Data analysis for datasets with very large dimensions, classification is needed to predict from large datasets, in this study compare a method for classifying large data where the data will be processed to obtain the desired data prediction information. In this study, the Support Vector Machine (SVM) is used to provide the classification results of an algorithm that will be compared with the incorporation of the Support Vector Machine (SVM) and Particle Swarm Optimization (PSO) where the test results will be compared with the SVM classification algorithm only as a comparison algorithm. better at predicting than data sets. SVM is used as a single algorithm to see different experimental results when SVM is combined with PSO. From the experiments carried out, SVM got an Accuracy value of 81.85% and an AUC value of 0.823 while SVM-PSO got an Accuracy value of 84.81% and an AUC value of 0.898.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.