Teachers deal with plagiarism on a regular basis, so they try to prevent and detect plagiarism, a task that is complicated by the large size of some classes. Students who cheat often try to hide their plagiarism (obfuscate), and many different similarity detection engines (often called plagiarism detection tools) have been built to help teachers. This article focuses only on plagiarism detection and presents a detailed systematic review of the field of source-code plagiarism detection in academia. This review gives an overview of definitions of plagiarism, plagiarism detection tools, comparison metrics, obfuscation methods, datasets used for comparison, and algorithm types. Perspectives on the meaning of source-code plagiarism detection in academia are presented, together with categorisations of the available detection tools and analyses of their effectiveness. While writing the review, some interesting insights have been found about metrics and datasets for quantitative tool comparison and categorisation of detection algorithms. Also, existing obfuscation methods classifications have been expanded together with a new definition of “source-code plagiarism detection in academia.”
In this work, it is shown that student access time series generated from Moodle log files contain information sufficient for successful prediction of student final results in blended learning courses. It is also shown that if time series is transformed into frequency domain, using discrete Fourier transforms (DFT), the information contained in it will be preserved. Hence, resulting periodogram and its DFT coefficients can be used for generating student performance models with the algorithms commonly used for that purposes. The amount of data extracted from log files, especially for lengthy courses, can be huge. Nevertheless, by using DFT, drastic compression of data is possible. It is experimentally shown, by means of several commonly used modelling algorithms, that if in average all but 5–10% of most intensive and most frequently used DFT coefficients are removed from datasets, the modelling with the remained data will result with the increase of the model accuracy. Resulting accuracy of the calculated models is in accordance with results for student performance models calculated for different dataset types reported in literature. The advantage of this approach is its applicability because the data are automatically collected in Moodle logs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.