Searching source code fragments using incremental clustering

Duracik, Michal; Kršák, Emil; Hrkut, Patrik

doi:10.1002/cpe.5416

Cited by 6 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Incremental clustering is based on the fact that if we already have clusters with a sufficient number of vectors, the addition of one vector to this system will not cause a fundamental change in the distribution of clusters. Based on our experiments [29], with a sufficiently large initial dataset, the addition of a single vector will cause a change in cluster distribution in a very small number of cases. With a sufficiently large number of vectors, these small shifts accumulate.…”

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 96%

“…In the second phase, the clustering of similar vectors occurs due to higher search efficiency. To cluster similar vectors, we use the known and widely used K-Means algorithm, which we modify for our purposes [29]. The result of this phase will be the data that are ready to be stored in a database in a form that allows them to be easily looked up.…”

Section: Figure 1 Structure Of the Designed Systemmentioning

confidence: 99%

“…For the needs of the source code clustering, we used the well-known K-Means method for characteristic vectors and we designed its extension for the needs of continuous source code processing [29]. Incremental clustering is based on the fact that if we already have clusters with a sufficient number of vectors, the addition of one vector to this system will not cause a fundamental change in the distribution of clusters.…”

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

“…However, using the designed algorithm, this parameter is dynamic. We can always reduce / increase the number of clusters when re-clustering [29].…”

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

“…It was not possible to index a vector that has more than 100 elements, so we detected significant components using conditional entropy and indexed only those. It proved that on average it is enough to select only five elements of the vector and create an index from them [29].…”

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

See 4 more Smart Citations

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

et al. 2020

Self Cite

View full text Add to dashboard Cite

The paper deals with the issue of detecting plagiarism in source code, which we unfortunately encounter when teaching subjects dealing with programming and software development. Many students want to simplify the completion of the course and therefore submit modified source codes of their classmates or even those found on the Internet. Some try to modify the source code e.g. by changing the identifiers of classes, methods and variables to different ones, by changing the corresponding loops, by introducing new methods or by changing the order of methods in the source code or in other ways. We focused directly on this problem and designed our own anti-plagiarism system that we describe in this paper. The designed system consists of three parts during which the source code is processed using six designed algorithms. The basis is the processing of the source code and its transformation into an abstract syntax tree, consisting of two types of nodes, which is then vectorized using our modified DECKARD algorithm. The vectors are then clustered and stored in a database from which similar parts of the source code can be searched. The output of the system is then the final report containing a list of matches with similarities of all works that have been added to the database until then. The designed anti-plagiarism system is finally compared with the success of plagiarism detection performed by the two most used anti-plagiarism tools, namely JPlag and MOSS. It is evaluated on assignments elaborated by students from the courses dealing with object-oriented programming at our faculty.

show abstract

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 96%

Section: Figure 1 Structure Of the Designed Systemmentioning

confidence: 99%

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

“…However, using the designed algorithm, this parameter is dynamic. We can always reduce / increase the number of clusters when re-clustering [29].…”

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

Section: Figure 8 Incremental Clustering Schemementioning

confidence: 99%

See 3 more Smart Citations

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

Editorial on Innovative Network Systems and Applications together with the Conference on Information Systems Innovations for Community Services

Hodoň

Furtak

Fahrnberger

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

The purpose of this special issue is to assemble a selection of best research articles that were presented within the 6th International Conference on Innovative Network Systems and Applications (iNetSApp'18), 1 which was organized under the Federated Conference on Computer Science and Information Systems 2018 (FedCSIS'18), 2 which was held in the Polish city Pozna ń. According to the FedCSIS policy, 23% acceptance rate was kept in all regular paper submissions with the help of the well-structured and experienced conference program committee. Since only the little amount of papers met the quality for the publication in this special issue, other selected best papers from the 18th International Conference on Innovations for Community Services (I4CS 2018), 3 which was held in Žilina, Slovakia, were selected for the publication. For this special issue, only the papers with best review score were selected, thus the quality of the CPE series can be conserved.This special issue focuses on different applications and algorithm developments in the area of modern network systems which encompass a wide range of solutions and technologies, including wireless and wired networks, network systems, services, and applications. The scope of this special issue is broad, aiming at different results in numerous active research areas oriented toward various technical, scientific, and social aspects of network systems and applications. This is supplied by the recent advances in the theory and practice in a wide range of aspects of Internet community services, especially of how community services can be used in many areas and how they have been deployed.The authors in Reference 4 focused on the improvements of the convergence time in IP networks. They introduced a new kind of IP fast re-route (IPFRR) mechanism called the multicast repair (MREP) IPFRR mechanism, which provides an advanced fast reroute technique for Internet service providers' core networks. The M-REP IPFRR mechanism is based on IP multicast and utilizes Protocol Independent Multicast-Dense Mode with the modification of an internal reverse path forwarding check. The M-REP IPFRR mechanism does not depend on any particular routing protocol type (distance-vector or link-state) and requires less system resources because of its precomputation-less character but still provides immediate reaction to failure.The article 5 dealt with the problematic of the work with the large datasets. The article proposed a Flower Index Approach as the method for dealing with the data processing. With this approach, it was possible to remove the impact of the High-Water Mark, as well as to remove useless block loading with no relevant data. On the other hand, it provided effective data access stream using a specific index. Full Table Scan could be the skipped and data could be accessed directly using the index ROWID locators.Data reliability with temporal data retrieval was discussed in the article. 6 The authors referenced an attribute-oriented temporal model with reflection on data grouping tech...

show abstract

Revealing Patterns for Programs in Graphical Visual Languages Using Clustering

Staroletov

2024

2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)

View full text Add to dashboard Cite

Searching source code fragments using incremental clustering

Cited by 6 publications

References 25 publications

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

Editorial on Innovative Network Systems and Applications together with the Conference on Information Systems Innovations for Community Services

Revealing Patterns for Programs in Graphical Visual Languages Using Clustering

Contact Info

Product

Resources

About