Recommender systems have the ability to filter unseen information for predicting whether a particular user would prefer a given item when making a choice. Over the years, this process has been dependent on robust applications of data mining and machine learning techniques, which are known to have scalability issues when being applied for recommender systems. In this paper, we propose a k-means clustering-based recommendation algorithm, which addresses the scalability issues associated with traditional recommender systems. An issue with traditional k-means clustering algorithms is that they choose the initial k centroid randomly, which leads to inaccurate recommendations and increased cost for offline training of clusters. The work in this paper highlights how centroid selection in k-means based recommender systems can improve performance as well as being cost saving. The proposed centroid selection method has the ability to exploit underlying data correlation structures, which has been proven to exhibit superior accuracy and performance in comparison to the traditional centroid selection strategies, which choose centroids randomly. The proposed approach has been validated with an extensive set of experiments based on five different datasets (from movies, books, and music domain). These experiments prove that the proposed approach provides a better quality cluster and converges quicker than existing approaches, which in turn improves accuracy of the recommendation provided.
Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.
Abstract. This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning techniques.Keywords: k-Means, EM clustering, k-medoids, intrusion detection system, anomaly detection, outlier detection IntroductionIntrusion detection is a process of gathering intrusion-related knowledge occurring in the process of monitoring events and analyzing them for signs of intrusion [1] [5]. There are two basic IDS approaches: misuse detection (signature-based) and anomaly detection. The misuse detection system uses patterns of well-known attacks to match and identify known intrusions. It performs pattern matching between the captured network traffic and attack signatures. If a match is detected, the system generates an alarm. The main advantage of the signature detection paradigm is that it can accurately detect instances of known attacks. The main disadvantage is that it lacks the ability to detect new intrusions or zero-day attacks [2][3].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.