As plagiarism of software increases rapidly, there are growing needs for software plagiarism detection systems. In this paper, we propose a software plagiarism detection system using an APIlabeled control flow graph (A-CFG) that abstracts the functionalities of a program. The A-CFG can reflect both the sequence and the frequency of APIs, while previous work rarely considers both of them together. To perform a scalable comparison of a pair of A-CFGs, we use random walk with restart (RWR) that computes an importance score for each node in a graph. By the RWR, we can generate a single score vector for an A-CFG and can also compare A-CFGs by comparing their score vectors. Extensive evaluations on a set of Windows applications demonstrate the effectiveness and the scalability of our proposed system compared with existing methods.
Recently, deep learning has become a preferred choice for performing tasks in diverse application domains such as computer vision, natural language processing, sensor data analytics for healthcare, and collaborative filtering for personalized item recommendation. In addition, the Generative Adversarial Networks (GAN) has become one of the most popular frameworks for training machine learning models. Motivated by the huge success of GAN and deep learning on a wide range of fields, this paper explores an effective way to exploit both techniques into the collaborative filtering task for the accurate recommendation. We have noticed that the IRGAN and GraphGAN are pioneering methods that successfully apply GAN to recommender systems. However, we point out an issue regarding the employment of standard matrix factorization (MF) as their basic model, which is linear and unable to capture the non-linear, subtle latent factors underlying user-item interactions. Our proposed recommendation framework, named Collaborative Adversarial Autoencoders (CAAE), significantly extends the conventional IRGAN and GraphGAN as summarized below: 1) we use Autoencoder, which is one of the most successful deep neural networks, as our generator, instead of using the MF model; 2) we employ Bayesian personalized ranking (BPR) as our discriminative model; and 3) we incorporate another generator model into our framework that focuses on generating negative items, which are items that a given user may not be interested in. We empirically test our framework using three real-life datasets along with four evaluation metrics. Owing to those extensions, our proposed framework not only produces considerably higher recommendation accuracy than the conventional GAN-based recommenders (i.e., IRGAN and GraphGAN), but also outperforms the other state-of-the-art top-N recommenders (i.e., BPR, PureSVD, and FISM).INDEX TERMS Collaborative filtering, deep learning, generative adversarial networks, recommender systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.