The National Nuclear Energy Agency of Indonesia (BATAN) taxonomy is a nuclear competence field organized into six categories. The Polytechnic Institute of Nuclear Technology, as an institution of nuclear education, faces a challenge in organizing student publications according to the fields in the BATAN taxonomy, especially in the library. The goal of this research is to determine the most efficient automatic document classification model using text mining to categorize student final project documents in Indonesian and monitor the development of the nuclear field in each category. The kNN algorithm is used to classify documents and identify the best model by comparing Cosine Similarity, Correlation Similarity, and Dice Similarity, along with vector creation binary term occurrence and TF-IDF. A total of 99 documents labeled as reference data were obtained from the BATAN repository, and 536 unlabeled final project documents were prepared for prediction. In this study, several text mining approaches such as stem, stop words filter, n-grams, and filter by length were utilized. The number of k is 4, with Cosine-binary being the best model with an accuracy value of 97 percent, and kNN works optimally when working with binary term occurrence in Indonesian language documents when compared to TF-IDF. Engineering of Nuclear Devices and Facilities is the most popular field among students, while Management is the least preferred. However, Isotopes and Radiation are the most prominent fields in Nuclear Technochemistry. Text mining can assist librarians in grouping documents based on specific criteria. There is also the possibility of observing the evolution of each existing category based on the increase of documents and the application of similar methods in various circumstances. Because of the curriculum and courses given, the growth of each discipline of nuclear science in the study program is different and varied.
Bibliometrics is increasingly being used by the knowledge community and librarians to easily analyze patterns in knowledge. In the field, the use of data from databases that provide bibliometric information is not always completely clean, so pre-processing is required. Several previous studies have shown that bibliometric analysis begins with a simple pre-processing step. The goal of this research is to use text mining to perform pre-processing to find the basic terms of the keywords that appear – to essentially construct a controlled vocabulary for a bibliographic dataset. The method used in this study is cleaning keywords with the stemming method using RapidMiner software. Bibliometrix was used to compare the results. A total of 85 keywords were combined into basic words. Using the built process, this study discovers differences in the network built between raw data and data that has been pre-processed, resulting in differences in the analysis that will be produced. The built process can also be reused in a variety of real-world situations.
Background of study: The implementation of librarian competency development through training in Sidenreng Rappang Regency, South Sulawesi Province, was carried out by dividing librarians based on the location of their agency's work area. In practice, there are training barriers, namely differences in absorption of the material due to limited training time and differences in initial knowledge of the training material. Purpose: According to the librarian's prior knowledge of the training to be held in future, this study attempts to determine the best grouping and number of participants. Method: The methodology used in this research is Cross Industry Standard Process for Data Mining (CRISP-DM) which consists of 6 stages. The data collection technique used a questionnaire with a linear numerical scale from a score of 0 to 10 to 97 librarians in Sidenreng Rappang Regency. Data were analyzed using the K-Means algorithm to determine the number of groups and the number of librarians in each group and evaluated using the Davies-Bouldin index (DBI) algorithm to determine the most optimal group division. Findings: According to this study, the best number of groups for training in the processing of library materials is two under a DBI value of 0.68983. With a DBI value of 0.69431, the best number of groups is two in the library promotion training. Conclusion: the library service training had the best number of groups of 2 with a DBI value of 0.65698. Meanwhile, for INLISLite-based automation training, the best number of groups is two groups with a DBI value of 0.65500.
Data visualization is a series of data processing processes that produce information in a dynamic visual form. Many libraries around the world have embraced data visualization as a decision-making tool to assist them in making key decisions, but few have discussed the possibilities for libraries to employ this technology to help their users' economic growth. The closeness to life in the library, the opportunities for libraries as services based on social inclusion, as well as the challenges that arise for librarians are discussed in this article. The method used is a literature study of literature sources related to data visualization, business, libraries, and librarian competencies. The result is that the concept of data visualization resembles daily activities in the library, namely repackaging information and making statistics. Data visualization aids librarians in transforming data into actionable information for decision-making and serving it to users. Data visualization opens up opportunities for libraries to be used as services based on social inclusion to help the economic growth of their users. The challenge for librarians is the limited skillsets for this technology. To develop their skills, librarians are urged to enroll in education and training programs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.