Source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.
The plowing with precedence problem is a variant of the windy postman problem, where a plow is required to clean streets after a heavy snowfall with traversing costs depending on the direction of traversal as well as whether a street has been previously plowed or not. We prove that this problem can be solved in polynomial time under natural cost structures. We also propose heuristics for this problem, which compare favorably with the state of the art.
Biological network analysis is used to interpret modern high-throughput biomedical data sets in terms of biological functions and pathways. However, the results greatly depend on the topological characteristics of the underlying network, commonly composed of nodes representing genes or proteins that are connected by edges when interacting. In this study, we build biological networks accounting for small molecules, protein isoforms and post-translational modifications. We highlight how these change the global structure of the network and how the connectedness of pathway-based networks is altered. Our findings highlight the importance of carefully crafting the networks for network analysis to better represent the reality of biological systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.