2011
DOI: 10.1007/978-3-642-22327-3_31
|View full text |Cite
|
Sign up to set email alerts
|

Towards the Detection of Cross-Language Source Code Reuse

Abstract: Abstract. Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When consider… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
6

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(16 citation statements)
references
References 5 publications
0
10
0
6
Order By: Relevance
“…Cross-language plagiarism detection is discussed in paper "Towards the detection of crosslanguage source code reuse" (Flores et al, 2011) whose authors found that methods applied for natural text (specifically n-gram comparison) work for Java, C and Python too. The other method might be comparison of an intermediate code produced by a special compiler suite.…”
Section: Theoretical Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…Cross-language plagiarism detection is discussed in paper "Towards the detection of crosslanguage source code reuse" (Flores et al, 2011) whose authors found that methods applied for natural text (specifically n-gram comparison) work for Java, C and Python too. The other method might be comparison of an intermediate code produced by a special compiler suite.…”
Section: Theoretical Frameworkmentioning
confidence: 99%
“…Even as the scope has become broader, plagiarism remains one of the most important academic integrity issues appearing in student assignments undertaken individually or in groups, without direct supervision. In the digital era, massive amounts of information are available to reuse for anyone struggling with an assignment who is tempted to plagiarise (Flores et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…Such methods, as stated in [21], involve more complex as well as robust approaches. Normally, source code files are treated as text files, hence, common methods such as the traditional Bagof-Words, character n-grams [12,22], and longest common sub-sequence [2,15] are among the most popular techniques. One of such work takes into account the "whitespace" indentation patterns of a source code file [2], where a source code document is converted to a pattern, namely whitespace format, replacing any visible character by X and any whitespace by S, and leaving newlines as they appear.…”
Section: Related Workmentioning
confidence: 99%
“…Consists of the character 3-gram based model proposed in [12]. In this model, the source code is considered as a text and represented as character 3-grams, where these n-grams are weighted using term frequency scheme.…”
Section: </Document>mentioning
confidence: 99%
“…Por otro lado, los sistemas extrínsecos cuentan con una colección de códigos fuente confiables contra la cual se compara el código sospechoso. De esta manera, tratan de detectar si alguno de los códigos fuente confiables se han reutilizado o incluso si ha sido reutilizado el código completo de alguno o varios de ellos [6,7].…”
Section: Antecedentes Y Estado Del Arteunclassified