Abstrack-Along with the times, demands for information retrievals in scientific papers have also increased. Regarding experimental scientific papers, researchers have difficulty in searching for information on experimental scientific papers because information retrieval engines have limitations in the search process due to text mining-based feature extraction of the entire text, while experimental types of scientific paper have specific contents, which should have a different treatment in feature extraction. In this paper, we propose a new system for information retrieval on experimental scientific papers. This system consists of 4 main functions: (1) Specific content-based feature extraction, (2) Classification model, (3) Context-based subspace selection, and (4) Context-dependent similarity measurement. In feature extraction, our system extracts feature category in experimental scientific papers with specific content-based features, which are data, problem, method and result. To perform the applicability of our proposed system, we tested 77 papers in the dataset with the Leave-One-Out validation model with several classification algorithm (Nearest Neighbour, Naive Bayes, Support Vector Machine and Decision Tree) and on average performed 66.65% precision rate and accuracy of 76,18% precision rate. We also made the experiment on the similarity, our proposed system performed 79.17% accuracy rate Keywords-Scientific experimental paper, Context-base subspace selection, Context-dependent similarity measurement. Intisari-Seiring dengan perkembangan zaman permintaan pencarian informasi dalam makalah ilmiah juga meningkat. Mesin pencari informasi yang ada saat ini memiliki keterbatasan dalam proses pencarian berdasarkan ekstraksi fitur berbasis text-mining dari seluruh teks, sedangkan jenis makalah ilmiah eksperimental memiliki konten spesifik. Dalam makalah yang kami usulkan sistem untuk pengambilan informasi pada makalah ilmiah eksperimental. Sistem terdiri dari 4 fungsi: (1) Ekstraksi fitur berbasis konten, (2) Model klasifikasi, (3) Pemilihan subruang berbasis konteks, dan (4) Pengukuran kesamaan berdasar pada konteks. Dalam Pemilihan Subruang Berbasis Konteks, sistem melakukan pengurangan dimensi dengan pemilihan subruang berbasis konteks yang dipilih oleh pengguna. Untuk mendapatkan hasil pencarian akhir, kami mengukur kesamaan konteks dengan membangun metrik dataset berdasar konteks ke paper. Untuk melakukan penerapan sistem yang kami usulkan, kami menguji 77 makalah dalam dataset dengan model validasi Leave-One-Out dengan beberapa algoritma klasifikasi (Nearest Neighbor, Naive Bayes, Support Vector Machine, dan Decision Tree) dan rata-rata melakukan presisi 66,65% tingkat dan akurasi tingkat presisi 76,18%. Kami juga melakukan percobaan pada pengukuran kesamaan dengan memberikan queri paper dan konten yang diinginkan (data, hasil, metode, dan masalah) sebagai konteks yang diberikan oleh pengguna. Dalam percobaan pengukuran kesamaan, sistem yang kami usulkan memiliki tingkat akurasi 79,17%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.