2009 6th IEEE International Working Conference on Mining Software Repositories 2009
DOI: 10.1109/msr.2009.5069496
|View full text |Cite
|
Sign up to set email alerts
|

Using Latent Dirichlet Allocation for automatic categorization of software

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
77
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 136 publications
(77 citation statements)
references
References 19 publications
0
77
0
Order By: Relevance
“…First, topic modeling using LDA has been applied to solve a wide range of problems in software engineering such as: statistical debugging (Andrzejewski et al 2007), mining business topics (Maskeri et al 2008), mining author-topic models (Linstead et al 2007b), software traceability (Asuncion et al 2010), software categorization (Tian et al 2009), bug localization (Lukins et al 2008) etc. In an earlier work we used LDA topic modeling to mine topics from large corpus of source code, and showed that topics that emerge often resemble widely known aspects or concerns in source code (Baldi et al 2008).…”
Section: Topic Modelingmentioning
confidence: 99%
“…First, topic modeling using LDA has been applied to solve a wide range of problems in software engineering such as: statistical debugging (Andrzejewski et al 2007), mining business topics (Maskeri et al 2008), mining author-topic models (Linstead et al 2007b), software traceability (Asuncion et al 2010), software categorization (Tian et al 2009), bug localization (Lukins et al 2008) etc. In an earlier work we used LDA topic modeling to mine topics from large corpus of source code, and showed that topics that emerge often resemble widely known aspects or concerns in source code (Baldi et al 2008).…”
Section: Topic Modelingmentioning
confidence: 99%
“…LACT is another system that relies on information retrieval to categorize software (Tian et al 2009). LACT uses Latent Dirichlet Allocation (LDA) over the same dataset as Kawaguchi et al in order to infer topics to which applications belong.…”
Section: Related Workmentioning
confidence: 99%
“…In this section we provide details behind LDA followed by how RTM extends this model to capture links among documents. While LDA has been previously applied in the context of software engineering for measuring conceptual cohesion of classes [27], recovering traceability links [3,31], mining software repositories [4,30,36] and bug location [28], RTM has not been utilized for software measurement tasks before.…”
Section: Using Relational Topic Models For Coupling Measurement mentioning
confidence: 99%
“…Applying PCA to metrics data consist of the following steps: collecting the metrics data, identifying outliers, and performing PCA. We applied PCA in the similar manner as in our previous work [29,31,36], including procedures on identifying outliers and rotating principal components. Overall, by performing PCA we can identify groups of variables (i.e., coupling metrics), which are likely to measure the same underlying dimension (i.e., specific mechanism that defines coupling) of the object to be measured (i.e., coupling of classes).…”
Section: ) Principal Component Analysismentioning
confidence: 99%
See 1 more Smart Citation