Clone Detection in Reuse of Software Technical Documentation

Koznov, Dmitrij V.; Luciv, Dmitry; Basit, Hamid Abdul; Lieh, Ouh Eng; Смирнов, М. Н.

doi:10.1007/978-3-319-41579-6_14

Cited by 9 publications

(12 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In our previous studies [8,9], we presented an approach that offers a partial solution for this problem. At first, similarly to Juergens et al [2], Wingkvist et al [6], we applied software clone detection techniques to exact duplicate detection [37]. Then, in [8,9] we proposed an approach to near duplicate detection.…”

Section: Basic Near Duplicate Detection and Refactoringmentioning

confidence: 99%

“…At first, similarly to Juergens et al [2], Wingkvist et al [6], we applied software clone detection techniques to exact duplicate detection [37]. Then, in [8,9] we proposed an approach to near duplicate detection. It is essentially as follows: having exact duplicates found by Clone Miner, we extract sets of duplicate groups where clones are located close to each other.…”

Section: Basic Near Duplicate Detection and Refactoringmentioning

confidence: 99%

“…The initial interval tree for is constructed using the () function (line 2). The core part of the algorithm is a loop in which new near duplicate groups are constructed (lines [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]…”

Section: Algorithm Descriptionmentioning

confidence: 99%

“…Their approach was based on an informal definition of near duplicates and the problem of duplicate detection was not addressed. In our previous studies [8,9] we presented a near duplicate detection approach. Its core idea is to uncover near duplicates and then to apply the reuse techniques described in our earlier studies [4,5].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Detecting Near Duplicates in Software Documentation

Luciv

Koznov

Chernishev

et al. 2018

Program Comput Soft

View full text Add to dashboard Cite

Contemporary software documentation is as complicated as the software itself. During its lifecycle, the documentation accumulates a lot of "near duplicate" fragments, i.e. chunks of text that were copied from a single source and were later modified in different ways. Such near duplicates decrease documentation quality and thus hamper its further utilization. At the same time, they are hard to detect manually due to their fuzzy nature. In this paper we give a formal definition of near duplicates and present an algorithm for their detection in software documents. This algorithm is based on the exact software clone detection approach: the software clone detection tool Clone Miner was adapted to detect exact duplicates in documents. Then, our algorithm uses these exact duplicates to construct near ones. We evaluate the proposed algorithm using the documentation of 19 open source and commercial projects. Our evaluation is very comprehensive -it covers various documentation types: design and requirement specifications, programming guides and API documentation, user manuals. Overall, the evaluation shows that all kinds of software documentation contain a significant number of both exact and near duplicates. Next, we report on the performed manual analysis of the detected near duplicates for the Linux Kernel Documentation. We present both quantative and qualitative results of this analysis, demonstrate algorithm strengths and weaknesses, and discuss the benefits of duplicate management in software documents.

show abstract

Section: Basic Near Duplicate Detection and Refactoringmentioning

confidence: 99%

Section: Basic Near Duplicate Detection and Refactoringmentioning

confidence: 99%

Section: Algorithm Descriptionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Detecting Near Duplicates in Software Documentation

Luciv

Koznov

Chernishev

et al. 2018

Program Comput Soft

View full text Add to dashboard Cite

show abstract

“…Duplicates in software documentation have been extensively studied during the last decade [6,11,12,13,14,15,16,17]. At the same time, there are no specialized tools for duplicate detection.…”

Section: Introductionmentioning

confidence: 99%

Interactive Near Duplicate Search in Software Documentation

Luciv¹,

Koznov²,

Shelikhovskii³

et al. 2019

Program Comput Soft

View full text Add to dashboard Cite

Various software features such as classes, methods, requirements, and tests often have similar functionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled duplicates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of duplicate detection in software documentation is of importance. Solving it makes planned reuse possible, as well as creating and using templates for unification and automatic generation of documentation. In this paper, we present an interactive process for duplicate detection that involves the user in order to conduct meaningful search. It includes a new formal definition of a near duplicate, a pattern-based, and the proof of its completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of several industrial projects.

show abstract