Statistical Debugging Using Latent Topic Models

Andrzejewski, David; Mulhern, Anne; Liblit, Ben; Zhu, Xiaojin

doi:10.1007/978-3-540-74958-5_5

Cited by 63 publications

(60 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, topic modeling using LDA has been applied to solve a wide range of problems in software engineering such as: statistical debugging (Andrzejewski et al 2007), mining business topics (Maskeri et al 2008), mining author-topic models (Linstead et al 2007b), software traceability (Asuncion et al 2010), software categorization (Tian et al 2009), bug localization (Lukins et al 2008) etc. In an earlier work we used LDA topic modeling to mine topics from large corpus of source code, and showed that topics that emerge often resemble widely known aspects or concerns in source code (Baldi et al 2008).…”

Section: Topic Modelingmentioning

confidence: 99%

Analyzing and mining a code search engine usage log

Bajracharya

Lopes

2010

Empir Software Eng

View full text Add to dashboard Cite

This paper presents an analysis of a year long usage log of Koders, the first commercially available Internet-Scale code search engine (http://www.koders.com). The usage log comprises about ten million activities from more than three million users. Analysis of the usage data shows that despite of attracting a large number of visitors, Koders has a very sparse usage and that it lacks regular usage from many of its users. When compared to Web search, search behavior in Koders showed many similar patterns. A topic modeling analysis of the usage data shows what topics users of Koders are looking for. Observations on the prevalence of these topics among the users, and observations on how search and download activities vary across topics, lead to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper also presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. It identifies various forms of queries in Koders's log and the kinds of results addressed by the queries. It also provides several suggestions for improvements in code search engines based on the analysis of usage, topics, and query forms. The work presented in this paper is the first of its kind that reveals several insights on the usage of an Internet-Scale code search engine.

show abstract

Section: Topic Modelingmentioning

confidence: 99%

Analyzing and mining a code search engine usage log

Bajracharya

Lopes

2010

Empir Software Eng

View full text Add to dashboard Cite

show abstract

“…Our algorithm is able to learn joint models of both typical and rare behaviours even if they co-exist. MC-∆LDA is a generalisation and completion of the ∆LDA model proposed in [12]. ∆LDA was used for understanding code bugs in computer programs, but without an inference framework for the labels of unseen documents, ∆LDA cannot be used for classification.…”

Section: Related Workmentioning

confidence: 99%

Learning Rare Behaviours

Hospedales

Gong

et al. 2011

Computer Vision – ACCV 2010

View full text Add to dashboard Cite

Abstract. We present a novel approach to detect and classify rare behaviours which are visually subtle and occur sparsely in the presence of overwhelming typical behaviours. We treat this as a weakly supervised classification problem and propose a novel topic model: Multi-Class Delta Latent Dirichlet Allocation which learns to model rare behaviours from a few weakly labelled videos as well as typical behaviours from uninteresting videos by collaboratively sharing features among all classes of footage. The learned model is able to accurately classify unseen data. We further explore a novel method for detecting unknown rare behaviours in unseen data by synthesising new plausible topics to hypothesise any potential behavioural conflicts. Extensive validation using both simulated and real-world CCTV video data demonstrates the superior performance of the proposed framework compared to conventional unsupervised detection and supervised classification approaches.

show abstract

“…We follow the sitesand-predicates approach commonly used in prior work [1,17,[26][27][28]45]. An instrumentation site is a single program location at which the state of the running program will be inspected.…”

Section: Terminologymentioning

confidence: 99%

“…Statistical debugging techniques monitor run-time behavior to identify causes of crashes in end-user executions. Lightweight instrumentation [26] allows non-intrusive post-deployment monitoring, while statistical models [1,17,18,[26][27][28]45] identify profiled events that strongly predict crashes or other failures. Yet most programs mostly work: nearly all code in any given application is not relevant for any given bug.…”

Section: Introductionmentioning

confidence: 99%

Cooperative Bug Isolation

2007

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Statistical debugging uses lightweight instrumentation and statistical models to identify program behaviors that are strongly predictive of failure. However, most software is mostly correct; nearly all monitored behaviors are poor predictors of failure. We propose an adaptive monitoring strategy that mitigates the overhead associated with monitoring poor failure predictors. We begin by monitoring a small portion of the program, then automatically refine instrumentation over time to zero in on bugs. We formulate this approach as a search on the control-dependence graph of the program. We present and evaluate various heuristics that can be used for this search. We also discuss the construction of a binary instrumentor for incorporating the feedback loop into post-deployment monitoring. Performance measurements show that adaptive bug isolation yields an average performance overhead of 1% for a class of large applications, as opposed to 87% for realistic sampling-based instrumentation and 300% for complete binary instrumentation.

show abstract

Statistical Debugging Using Latent Topic Models

Cited by 63 publications

References 20 publications

Analyzing and mining a code search engine usage log

Analyzing and mining a code search engine usage log

Learning Rare Behaviours

Cooperative Bug Isolation

Contact Info

Product

Resources

About