In a term based clustering technique with the vector space model, the issue of high dimensional vector space due to the number of words used always appears. This causes the clustering performance drops because the distance among the points tends to have the same value. The reduction of dimension by decreasing the number of words can be done by stemming. Stemming was used as term selection to reduce the many terms generated on preprocessing. The utilization of algorithm of enhance confix stripping stemmer reduced the terms that must be processed of 199.358 terms resulted from 108 text documents, became 5.476 terms result of the stemming. This reduction would speed up the process and saved the storage media. The evaluation by utilizing clustering was done using confusion matrix. The accuracy of experiment increased.
Kecepatan layanan, ketepatan, keakuratan data, kemudahan penyampaian informasi serta akuntanbilitas menjadi alasan yang sangat penting bagi penerapan system informasi. Universitas Semarang (USM) merupakan perguruan tinggi swasta di Semarang yang mempunyai mahasiswa terbanyak ke 2 se Jawa Tengah. USM salah satu perguruan tinggi yang sedang berkembang dengan pesat. Banyaknya mahasiswa membuat USM mempunyai tangung jawab yang besar terhadap pendidikan mahasiswa sehingga kelak menjadi lulusan yang siap kerja sesuai dengan kebutuhan dunia usaha atau industry. Berdasarkan data tracer USM tahun 2019 menunjukkan keselarasan horizontal yaitu keselarasan seberapa erat hubungan antara bidang studi dengan pekerjaan alumni, tampak bahwa masih ada ketidaksesuaian (tidak sama sekali=1,6%, kurang=19,2%. Dan cukup besar=27,5%) kemampuan lulusan dengan stakeholder. Hal ini menjadi perhatian khusus perguruan tinggi untuk membenahi/mengatur strategi agar prosentase data tersebut berkurang. Algoritma Apriori merupakan algoritma yang paling dikenal untuk menemukan pola frekuensi tinggi. Pola frekuensi tinggi ini juga digunakan untuk menyusun aturan assosiatif dan juga beberapa teknik data mining yang lain. Aturan yang menyatakan asosiasi antara beberapa atributsering disebut affinity analysis atau market basket analysis. Penggunaan Algoritma Apriori pada perhitungan data mining dengan menggunakan data dari tracer Universitas Semarang bahwa Batasan dari minimum support adalah 50% dan minimum confidence nya adalah sebesar 100% sehingga membentuk 4 rules. Dari keempat rules yang dihasilkan bahwa pemodelan dengan menggunakan Algoritma Apriori dapat menghasilkan beberapa formasi rules sehingga dapat memberikan evaluasi kepada pihak Universitas untuk menyusun langkah-langkah hal ini dapat dilihat karena rule yang dihasilkan berbeda karena pada tiap hubungan lulusan dengan stakeholder mempunyai acuan serta gaya yang berbeda,
Stemming words to remove suffixes has applications in text search, translation machine, summarization document, and text classification. For example, Indonesian stemming reduces the words "kebaikan", "perbaikan", "memperbaiki" and "sebaikbaiknya" to their common morphological root "baik". In text search, this permits a search for a player to find documents containing all words with the stem play. In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make them match to relate difficult words. This research proposed a stemmer with more accurate word results by employing an algorithm which gave more than one word candidate results and more than one affix combinations. New stemming algorithm is called CAT stemming algorithm. Here, the word results did not depend on the order of the morphological rule. All rules were checked, and the word results were kept in a candidate list. To make an efficient stemmer, two kinds of word lists (vocabularies) were used: words that had more than one candidate words and list of root word as a candidate reference. The final word results were selected with several rules. This strategy was proved to have a better result than the two most known about Indonesian stemmers. The experiments showed that the proposed approach gave higher accuracy than the compared systems known.
<span id="docs-internal-guid-210930a7-7fff-b7fb-428b-3176d3549972"><span>The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.</span></span>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.