<p>Kemudahan memperoleh informasi saat ini telah banyak membantu manusia, salah satu mencari ulasan untuk tempat makan baru. Pencarian ulasan ini dipicu karena pengunjung tidak mengetahui layanan dari tempat tersebut. Ulasan juga dapat menguntungkan penjual, karena mereka mengetahui pengalaman yang dimiliki pengunjungnya. Oleh karena itu, ulasan palsu dimanfaatkan banyak orang untuk membuat ulasan palsu. Ulasan palsu bisa secara efektif dibedakan menggunakan <em>machine learning</em>. Namun, banyak dari dataset ulasan palsu ini tidak seimbang <em>(imbalanced dataset)</em> sehingga dapat mempengaruhi hasil klasifikasi. Oleh karena itu, penelitian ini menggunakan metode BOS untuk mengatasi tidak seimbangnya data dan melakukan klasifikasi dengan metode SVM. Adapun tahapan dari penelitian yaitu preprocessing, lalu pembobotan kata dengan TF-IDF dan fitur sentimen menggunakan <em>lexicon-based features</em>, dilanjutkan proses menyeimbangkan dataset dengan BOS, setelah itu proses klasifikasi oleh SVM. Adapun langkah dalam pengujian BOS dan SVM yaitu pembagian data latih dan uji dengan 80%:20%, setelah itu pencarian parameter terbaik pada data latih dengan <em>5-fold cross validation</em>, dan dievaluasi dengan data uji. Adapun nilai parameter terbaik pada BOS dan SVM yaitu N dengan nilai 400% dimana hasil evaluasi akurasi dengan nilai 78,6%; <em>precision </em>dengan nilai 19,7%; <em>recall </em>dengan nilai 17,1%; <em>f-measure </em>dengan nilai<em> </em>14,4%; dan <em>g-mean</em> dengan nilai 32%. Oleh karena itu, penggunaan BOS dapat meningkatkan hasil evaluasi dari terhadap klasifikasi ulasan palsu.</p><p><em><br /></em></p><p><strong><em>Abstract</em></strong></p><p><em>The convenience of obtaining information nowadays has helped many people such as looking for reviews for new places to eat. The search for reviews was triggered because visitors were not aware of the services of the place. Reviews can also benefit sellers, because they know the experience their visitors have had. Therefore, many people abuse reviews to create spam reviews. Spam reviews can be effectively resolved using machine learning. However, many of these spam review datasets are imbalanced and thus may affect classification results. In this study, BOS algorithm was used to overcome data imbalances, and SVM algorithm for the classification of spam reviews. The stages of the research are preprocessing, then weighting words with TF-IDF and sentiment features using lexicon-based features, followed by the process of balancing the dataset with BOS, and classification process with SVM. Step in testing BOS and SVM are split data of training and test data with 80%:20%, after that the search for the best parameters in the training data with 5-fold cross-validation, and evaluated with test data. The best parameter values for BOS and SVM were N with a value of 400% where the results of the accuracy evaluation were 78.6%; precision with a value of 19.7%; recall with a value of 17.1%; f-measure with a value of 14.4%; and g-mean with a value of 32%. Therefore, use of BOS can improve the evaluation results from the classification of spam reviews.</em></p><p><em><br /></em></p>
<p>Kemudahan untuk memperoleh informasi saat ini, telah sedikit membantu hidup kita. Seperti mencari ulasan untuk menimbang tempat atau barang yang akan dipilih. Beberapa orang memanfaatkan hal tersebut dengan membuat ulasan palsu untuk kepentingan mereka sendiri. Sehingga deteksi ulasan palsu sangat dibutuhkan. Model <em>Transformer</em> saat ini banyak diterapkan pada pemrosesan bahasa alami karena kinerja yang diperoleh nya sangat baik. Ada dua pendekatan yang dapat dilakukan dalam model <em>Transformer</em> yaitu <em>pre-training </em>dan <em>fine-tuning</em>. Penelitian sebelumnya telah banyak menggunakan <em>fine-tuning </em>dari model <em>Transformer</em> dikarenakan adanya kemudahan dalam pelatihan, waktu yang lebih sedikit, biaya dan kebutuhan lingkungan yang lebih rendah dibanding proses <em>pre-training</em>. Akan tetapi penelitian sebelumnya masih sedikit yang membandingkan model <em>deep learning</em> dengan <em>fine-tuning</em> yang khusus diterapkan pada deteksi ulasan palsu. Penelitian ini melakukan perbandingan model <em>Transformer</em> menggunakan pendekatan<em> fine-tuning</em> dengan <em>metode deep learning </em>yaitu CNN dengan berbagai <em>pretrained word embedding </em>untuk mengatasi deteksi ulasan palsu pada dataset Ott. Model RoBERTa mengungguli model <em>Transformer </em>dan <em>deep learning </em>dimana nilai akurasi 90,8%; <em>precision </em>90%; <em>recall </em>91,8% dan <em>f1-score </em>90,8%. Namun dari segi waktu komputasi model pelatihan, DistilBERT memperoleh waktu komputasi terkecil yaitu dengan nilai 200,5 detik. Meskipun begitu, hasil yang diperoleh model <em>Transformer</em> maupun <em>deep learning </em>memiliki kinerja yang baik untuk deteksi ulasan palsu pada dataset Ott.</p><p><em><br /></em></p><p><strong><em>Abstract</em></strong></p><p><strong><em><br /></em></strong></p><p><em>The ease of obtaining information today has helped our lives, like looking for reviews to weigh the place or item to choose. Some people take advantage of this by creating spam reviews for their benefit. So the detection of spam reviews is needed. Transformer models are currently widely applied to natural language processing because they have outstanding performance. Two approaches in the Transformer model is pre-training and fine-tuning. Previous studies have used a lot of fine-tuning due to the ease of training, less time, costs, and lower environmental requirements than the pre-training process. However, a few previous studies compare deep learning models with fine-tuning applied explicitly for detecting spam reviews. This study compares the Transformer model using a fine-tuning approach with a deep learning method, namely CNN, which uses various pre-trained word embedding to overcome the detection of false reviews in the Ott dataset. The result is RoBERTa model outperforms between Transformer and deep learning models, where the accuracy is 90.8%, precision is 90%, recall is 91.8%, and f1-score is 90.8%. Afterward, DistilBERT models obtained the shortest computation time with 200.5 seconds. However, the results obtained by both Transformer and deep learning models perform well to detect spam reviews in the Ott dataset.</em></p><p><em><br /></em></p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.