The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.
High quality data and data quality assessment which efficiently needed to data standardization in the research data repository. Three attributes most used i.e: completeness, accuracy, and timeliness are dimensions to data quality assessment. The purposes of the research are to increase knowledge and discuss in depth of research done. To support the research, we are using traditional review method on the Scopus database to identify relevant research. The literature review is limited for the type of documents i.e: articles, books, proceedings, and reviews. The result of document searching is filtered using some keywords i.e: data quality, data quality assessment, data quality dimensions, quality assessment, data accuracy, dan data completeness. The document that found be analyzed based on relevant research. Then, these documents compare to find out different of concept and method which used in the data quality metric. The result of analysis could be used as a recommendation to implement in the data quality assessment in the National Scientific Repository. ABSTRAKData berkualitas tinggi dan penilaian kualitas data yang efektif dibutuhkan untuk standaridasi data dalam repositori data penelitian. Tiga atribut yang paling banyak digunakan, yaitu kelengkapan, akurasi, dan ketepatan waktu. Tiga atribut tersebut merupakan beberapa dimensi untuk penilaian kualitas data. Penelitian ini bertujuan untuk meningkatkan pengetahuan dan membahas secara mendalam terhadap penelitian yang akan dilakukan. Untuk menunjang penelitian, kami menggunakan metode tinjauan pustaka secara tradisional pada database Scopus dan beberapa website terkemuka untuk mengidentifikasi penelitian yang relevan. Studi pustaka dibatasi pada jenis dokumen, yaitu artikel, buku, prosiding, dan tinjauan. Hasil pencarian dokumen disaring menggunakan beberapa kata kunci, yaitu data quality, data quality assessment, data quality dimensions, quality assessment, data accuracy, dan data completeness. Dokumen yang telah diperoleh selanjutnya dianalisis berdasarkan penelitian yang relevan. Selanjutnya, data dianalisis dan dibandingkan untuk mengetahui perbedaan konsep dan metode yang digunakan dalam mengukur kualitas data. Hasil analisis digunakan sebagai rekomendasi untuk diterapkan dalam menilai kualitas data pada sistem Repositori Ilmiah Nasional.
Scientific data repository has a main role in science because data can be reused, reproduced, and preserved in a long time. In Indonesia there is no institution that manage scientific data repository, generally they only manage publication such as books, journals and proceedings. This is because, most of research data is still managed by a researcher or research group. By using literature study and survey to the journal publisher, authors want to get an information on how to manage research data by publications. Furthermore, the result of literature study is compared to the survey result that produces an important point i.e journal publisher strongly agree to make a policy to the author to attach research data in every paper submitted. Most of journal publisher use Open Journal System (OJS) in managing journal articles, start from paper acceptance until paper publishing. Through this way, research data that attached will be automatically stored to the scientific data repository system based on Application Programming Interface (API). ABSTRAK Repositori data ilmiah memiliki peran penting dalam ilmu pengetahuan, karena data dapat digunakan kembali (reuse), direproduksi (reproduce), dan menjamin ketersediaan jangka panjang. Di Indonesia belum ada lembaga yang mengelola repositori data ilmiah, umumnya hanya mengelola publikasi dalam bentuk buku, jurnal, dan prosiding. Hal ini dikarenakan sebagian besar data penelitian masih dikelola oleh peneliti atau kelompok penelitian. Melalui studi pustaka dan survei kepada pengelola jurnal, penulis ingin memperoleh informasi bagaimana mengelola data penelitian melalui publikasi. Selanjutnya, analisis terhadap studi literatur dibandingkan dengan hasil survei yang menghasilkan poin penting diantaranya: pengelola jurnal sangat setuju untuk membuat kebijakan kepada penulis agar melampirkan data penelitiannya dalam setiap naskah yang dikirimkan. Sebagian besar penerbit jurnal menggunakan Open Journal System (OJS) dalam mengelola artikel jurnal, mulai penerimaan hingga artikel diterbitkan. Melalui mekanisme ini, data penelitian yang dilampirkan dalam setiap naskah akan tersimpan secara otomatis ke sistem repositori data ilmiah berbasis Application Programming Interface (API).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.