Estimating the Optimal Number of Latent Concepts in Source Code Analysis

Grant, Scott; Cordy, James R.

doi:10.1109/scam.2010.22

Cited by 45 publications

(49 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A few articles proposed approaches determining the number of topics [28], [29], but they were task-specific. Our paper explore the application of topic modeling in a generic perspective other than a task-driven style, so we need to be groping for other estimating approach.…”

Section: The Number Of Topicsmentioning

confidence: 99%

“…For instance, we used JHotDraw [27], a Java GUI framework, as our learning object. Referring to Grant and Cordy [29], where they think 100 to 200 is the best area for the number of topics of JHotDraw, we tested the number of topics ranging from 50 to 250 in 10 increments and evaluated each result using our Naive Criterion. We found that 80 is the most optimum value for the number of topics of JHotDraw.…”

Section: The Number Of Topicsmentioning

confidence: 99%

See 1 more Smart Citation

JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling

Wang¹,

Liu²

2017

ijacsa

View full text Add to dashboard Cite

Abstract-Understanding a large number of source code is a big challenge for software development teams in software maintenance process. Using topic models is a promising way to automatically discover feature and structure from textual software assets, and thus support developers comprehending programs on software maintenance. To explore the application of applying topic modeling to software engineering practice, we proposed JSEA (Java Software Engineers Assistant), an interactive program comprehension tool adopting LDA-based topic modeling, to support developers during performing software maintenance tasks. JSEA utilizes essential information automatically generated from Java source code to establish a project overview and to bring search capability for software engineers. The results of our preliminary experimentation suggest the practicality of JSEA.

show abstract

Section: The Number Of Topicsmentioning

confidence: 99%

Section: The Number Of Topicsmentioning

confidence: 99%

JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling

Wang¹,

Liu²

2017

ijacsa

View full text Add to dashboard Cite

show abstract

“…However, in light of a recent study that showed that source code is exhibiting different characteristics that natural language text (e.g., it is more predictable and more repetitive) [6], we argue that using the same parameter values used in the IR community may not produce optimal results for SE. Although there were some heuristics [15,16] for configuring LDA parameters, these approaches focus only on configuring the number of topics, excluding the other hyper-parameters.…”

Section: B Lda-gamentioning

confidence: 99%

Configuring topic models for software engineering tasks in TraceLab

Dit

Panichella

Moritz

et al. 2013

2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)

View full text Add to dashboard Cite

Abstract-A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.

show abstract

“…Such methods can be a) manual, based on a domain expert understanding of the system [7,180] , b) experimentallydetermined, in which LDA parameters are tuned until a configuration that achieves acceptable performance over a certain quality measure is reached [16,22], or c) automatically generated using statistical methods or machine learning approaches [101,202].…”

Section: Latent Dirichlet Allocationmentioning

confidence: 99%

Toward an effective automated tracing process

Mahmoud

2012

2012 20th IEEE International Conference on Program Comprehension (ICPC)

View full text Add to dashboard Cite

Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project's life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V).The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone.Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry.

show abstract

Estimating the Optimal Number of Latent Concepts in Source Code Analysis

Cited by 45 publications

References 25 publications

JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling

JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling

Configuring topic models for software engineering tasks in TraceLab

Toward an effective automated tracing process

Contact Info

Product

Resources

About