2005
DOI: 10.1007/11497455_41
|View full text |Cite
|
Sign up to set email alerts
|

Measuring Similarity of Large Software Systems Based on Source Code Correspondence

Abstract: Abstract. It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX). The resulting similarity valuations clearly revealed the evolutionary history characteristics of the BSD UNIX Operating System.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2005
2005
2019
2019

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 37 publications
(25 citation statements)
references
References 29 publications
0
25
0
Order By: Relevance
“…Three approaches similar to ours are [19,14,22]. The first one detects function clones by comparing a set of metrics for each combination of functions and then categorizes the results on an ordinal scale.…”
Section: Related Workmentioning
confidence: 99%
“…Three approaches similar to ours are [19,14,22]. The first one detects function clones by comparing a set of metrics for each combination of functions and then categorizes the results on an ordinal scale.…”
Section: Related Workmentioning
confidence: 99%
“…Yamamoto et al proposed SMAT tool that calculates similarity of software systems by counting similar lines of source code [17]. They identify corresponding source files between two software systems using CCFinder [10], and then compute differences between file pairs.…”
Section: Software Evolutionmentioning
confidence: 99%
“…The first approach performs cluster analysis for the sets of source code [6]. This is based on the similarity of two sets of source code, which is defined as the ratio of the numbers of similar code lines to that of the overall lines of two software systems.…”
Section: Automatic Categorizationmentioning
confidence: 99%