Recommendation systems are essential tools for piquing consumers' interests and stimulating consumption in today's electronic commerce, and the quality of these systems depends on the employed filtering algorithms. Therefore, improving the performance of these algorithms is an important issue. In this paper, we design an intensity-based contraction (IC) algorithm that works in combination with other machine-learning algorithms in model-based collaborative filtering, which is currently the most popular filtering algorithm. The main challenges for this algorithm are sparseness of the database and lack of scalability. To demonstrate how IC is used, we implemented IC clustering as an example, which can effectively reduce the sparseness of the database and improve the efficiency. Moreover, we created a scalable IC on a MapReduce model, the scalability of which is demonstrated with actual experiments.
With the fast development of online education, the volume of education data traffic increased dramatically. Security information is potential to be mined from it. We can use data mining with some cloud computing platform for malware detection because the data volume is huge. The online education institutions need to virtualize their data centers and build cloud infrastructure for better using resources. So they should move data centers from physical machines(PMs) to virtual machines(VMs) for implementing the virtualization. But there are some risks such as the loss of computing ability, performance decline and so on.
In this paper, we do a series of experiments to test performance of data mining algorithm based on Hadoop in physical machines and virtual machines. Through these experiments, we find that the performance of data mining algorithm based on Hadoop depends on disk I/O performance of Hadoop. The disk I/O performance of Hadoop deployed in PMs is better than that inVMs .Some iterative algorithms like k-means need more disk I/O, so we don't advise using VMs for computing. Other basic algorithms like Bayes classification need less disk I/O, so we advise computing in the VMs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.