2019
DOI: 10.2197/ipsjjip.27.42
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Clustering of OSS License Statements toward Automatic Generation of License Rules

Abstract: Reusing open source software (OSS) components for one's own software products has become common in the modern software development. Automated license identification tools have been proposed to help developers identify OSS licenses, since a large number of licenses sometimes must be checked before attempting to reuse. Of the existing tools, Ninka [1] can most correctly identify licenses of each source file by using regular expressions. In case Ninka does not have license identification rules for unknown license… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…Figure 2 shows an overview of the proposed method consisting of the following 5 major processes. 3.1 First, the clustering method of our previous studies [2], [3] is used to classify the unknown license statements according to the similarity of the word vectors (Bag-of-Words). 3.2 Next, the license statements that cannot be classified due to slight differences in minor versions are flittered out as outliers by the similarity based on the Levenshtein distance.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Figure 2 shows an overview of the proposed method consisting of the following 5 major processes. 3.1 First, the clustering method of our previous studies [2], [3] is used to classify the unknown license statements according to the similarity of the word vectors (Bag-of-Words). 3.2 Next, the license statements that cannot be classified due to slight differences in minor versions are flittered out as outliers by the similarity based on the Levenshtein distance.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…(Step 3) Generating license rules: Tokenize license statements as regular expressions and generate license rules that can be matched to new licenses. In our previous study [2], [3], we proposed a clustering method to classify the license statements of detected unknown licenses to automate Step 1. In this paper, we focus on Step 2 and Step 3 to automatically generate license rules from each cluster created by the clustering method.…”
Section: (Step 1) Grouping Source Files With Unknown Licensesmentioning
confidence: 99%
See 3 more Smart Citations