General elections are an important part of the political process so that many political figures participate in the process. Electability is one of the concerns, various things are done to be able to increase the electability of political figures who participate in general elections. Media has become one of the important tools used to increase electability, one of which is online news media. Reader comments can be used as an assessment of political figures in the form of sentiment analysis. However, it is not easy to analyze sentiments from comments on online news media, because comments contain unstructured text, especially in Indonesian text. Text pre-processing in text mining is an important part of getting the basic information contained in the comments. This research uses Indonesian text pre-processing using the Gata Framework Tetmining. Then proceed with extracting information using the Naïve Bayes classification algorithm and Support Vector Machine which are optimized using Particle Swarm Optimization. Tests carried out with both methods get the results that, Particle Swarm Optimization based on Support Vector Machine is the best method with an accuracy of 78.40% and AUC 0.850. This study found an algorithm that was effective in classifying positive and negative comments related to political figures from online news media.
Research in the field of Text Mining in general still uses text in English, Arabic, China or others language, while for text in Indonesian is still very limited, so it requires good tools to help Indonesian researchers to conduct research in the field of text mining in Indonesian. Pre-processing is needed for text mining processes such as deleting notation ‘@’, ‘http’ removal, Indonesian stopwords, normalizing acronym, slang words, emoticons, and Indonesian stemming. The GATA Framework Text Mining provided is one of the options for conducting text mining research in Indonesian and has been used by several researchers. There are several known data mining processing methods, including KKD, CRISP-DM, and SEMMA, all three of which are quite reliable methods. CRISP-DM which consists of; Bussiness Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment is a method that is quite widely used in research in the field of text mining which can be combined with text pre-processing. With so much research in the field of Text Mining in Indonesian, the need for pre-processing in Indonesian is very important. GATA Framework is an option for pre-processing devices that can be combined with Repidminer devices, as seen from the results of the excellent FUPRS.
The cosmetics business competition in Indonesia is currently increasing so rapidly, cosmetics customers have spread to various brands, and according to taste. The customer for the company is an asset that is very important for business continuity, so that good customer management can increase company revenue. However, it is not easy to manage customers if they cannot read the characteristics of customers, to carry out appropriate business strategies. So that requires a customer analysis method that can provide recommendations for the company. RFM is one of the most widely used analytical methods for analyzing customers through segmentation and profiling of customers. In addition to segmenting, the customer profile is also a very important factor in analyzing customers, ALC is a form of a customer profile that can be used. RFM + ALC method is not easy to do with very large customer history data, so data mining is needed to help conduct the RFM + ALC analysis. Data mining methods using the clustering function with K-Means and the use of the Elbow method to get the most optimal amount of K in the clustering process can be a model used to segment with RFM, as well as the Naïve Bayes and Decision Tree classification methods to determine ALC profile factors the most influential customer. The results of clustering modeling carried out produce two dominant customer segments. While the Naïve Bayes classification model of the ALC factor can provide recommendations for the most influential customer profiles, with the highest level of accuracy with an accuracy value of 65.87% when compared to the Decision Tree.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.