Fu Wei scite author profile

Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the "black arts" of data mining is setting the tunings that control the miner. Objective: We seek simple, automatic, and very effective method for finding those tunings. Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data. Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%. Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue.

show abstract

Heterogeneous Defect Prediction

Nam

Wei

Kim

et al. 2018

IIEEE Trans. Software Eng.

267

226

View full text Add to dashboard Cite

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect prediction (WPDP). Researchers also proposed crossproject defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. In recent studies, CPDP is proved to be feasible. However, CPDP requires projects that have the same metric set, meaning the metric sets should be identical between projects. As a result, current techniques for CPDP are difficult to apply across projects with heterogeneous metric sets. To address the limitation, we propose heterogeneous defect prediction (HDP) to predict defects across projects with heterogeneous metric sets. Our HDP approach conducts metric selection and metric matching to build a prediction model between projects with heterogeneous metric sets. Our empirical study on 28 subjects shows that about 68% of predictions using our approach outperform or are comparable to WPDP with statistical significance.

show abstract

What is wrong with topic modeling? And how to fix it using search-based software engineering

Agrawal

Wei

Menzies

2018

Information and Software Technology

190

224

View full text Add to dashboard Cite

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeling technique is Latent Dirichlet allocation. When running on different datasets, LDA suffers from "order effects", i.e., different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results; specifically, inaccurate topic descriptions and a reduction in the efficacy of text mining classification results. Objective: To provide a method in which distributions generated by LDA are more stable and can be used for further analysis. Method: We use LDADE, a search-based software engineering tool which uses Differential Evolution (DE) to tune the LDA's parameters. LDADE is evaluated on data from a programmer information exchange site (Stackoverflow), title and abstract text of thousands of Software Engineering (SE) papers, and software defect reports from NASA. Results were collected across different implementations of LDA (Python+Scikit-Learn, Scala+Spark) across Linux platform and for different kinds of LDAs (VEM, Gibbs sampling). Results were scored via topic stability and text mining classification accuracy. Results: In all treatments: (i) standard LDA exhibits very large topic instability; (ii) LDADE's tunings dramatically reduce cluster instability; (iii) LDADE also leads to improved performances for supervised as well as unsupervised learning. Conclusion: Due to topic instability, using standard LDA with its "off-the-shelf" settings should now be depreciated. Also, in future, we should require SE papers that use LDA to test and (if needed) mitigate LDA topic instability. Finally, LDADE is a candidate technology for effectively and efficiently reducing that instability.

show abstract

Easy over hard: a case study on deep learning

2017

View full text Add to dashboard Cite

Despite extensive research, many methods in software quality prediction still exhibit some degree of uncertainty in their results. Rather than treating this as a problem, this paper asks if this uncertainty is a resource that can simplify software quality prediction.For example, Deb's principle of ϵ-dominance states that if there exists some ϵ value below which it is useless or impossible to distinguish results, then it is superfluous to explore anything less than ϵ. We say that for "large ϵ problems", the results space of learning effectively contains just a few regions. If many learners are then applied to such large ϵ problems, they would exhibit a "many roads lead to Rome" property; i.e., many different software quality prediction methods would generate a small set of very similar results.This paper explores DART, an algorithm especially selected to succeed for large ϵ software quality prediction problems. DART is remarkable simple yet, on experimentation, it dramatically outperforms three sets of state-of-the-art defect prediction methods.The success of DART for defect prediction begs the questions: how many other domains in software quality predictors can also be radically simplified? This will be a fruitful direction for future work.

show abstract

Mesenchymal Stem Cells Prolong Composite Tissue Allotransplant Survival in a Swine Model

Kuo

Goto

Shih

et al. 2009

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fu Wei

Tuning for software analytics: Is it really necessary?

Heterogeneous Defect Prediction

What is wrong with topic modeling? And how to fix it using search-based software engineering

Easy over hard: a case study on deep learning

Mesenchymal Stem Cells Prolong Composite Tissue Allotransplant Survival in a Swine Model

Contact Info

Product

Resources

About