2019
DOI: 10.1002/cpe.5416
|View full text |Cite
|
Sign up to set email alerts
|

Searching source code fragments using incremental clustering

Abstract: Summary Plagiarism is becoming an increasingly serious problem in academic environment. In this paper, we deal with a specific kind of plagiarism: source code plagiarism. In this case, there is no software available for detecting plagiarism on a larger scale (hundreds of student submissions every year). We propose algorithms for source code parsing and processing as a part of a complex system for plagiarism detection. A source code vectorization using characteristic vectors is a vital piece of the whole proces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…Incremental clustering is based on the fact that if we already have clusters with a sufficient number of vectors, the addition of one vector to this system will not cause a fundamental change in the distribution of clusters. Based on our experiments [29], with a sufficiently large initial dataset, the addition of a single vector will cause a change in cluster distribution in a very small number of cases. With a sufficiently large number of vectors, these small shifts accumulate.…”
Section: Figure 8 Incremental Clustering Schemementioning
confidence: 96%
See 4 more Smart Citations
“…Incremental clustering is based on the fact that if we already have clusters with a sufficient number of vectors, the addition of one vector to this system will not cause a fundamental change in the distribution of clusters. Based on our experiments [29], with a sufficiently large initial dataset, the addition of a single vector will cause a change in cluster distribution in a very small number of cases. With a sufficiently large number of vectors, these small shifts accumulate.…”
Section: Figure 8 Incremental Clustering Schemementioning
confidence: 96%
“…In the second phase, the clustering of similar vectors occurs due to higher search efficiency. To cluster similar vectors, we use the known and widely used K-Means algorithm, which we modify for our purposes [29]. The result of this phase will be the data that are ready to be stored in a database in a form that allows them to be easily looked up.…”
Section: Figure 1 Structure Of the Designed Systemmentioning
confidence: 99%
See 3 more Smart Citations