How far are we from reproducible research on code smell detection? A systematic literature review

Lewowski, Tomasz; Madeyski, Lech

doi:10.1016/j.infsof.2021.106783

Cited by 30 publications

(14 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our previous work [10], we tackled the problem of developing a machine learning (ML)-based code smell detector. Like other researchers, we, unfortunately, found that the publicly available code smell datasets may be hard to reproduce [11], [12] and contain noisy labels due to annotators' inconsistent understand-ing of the code smells [12]. As the performance of ML models highly depends on the used dataset, the field of ML-based code smell detection would greatly benefit from a systematic approach to labeling code smells.…”

Section: Introductionmentioning

confidence: 91%

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#

Kovačević¹,

Luburić²,

Slivka³

et al. 2022

Preprint

View full text Add to dashboard Cite

Code smells are code structures that harm the software’s quality. An obstacle to developing automatic detectors is the available datasets' limitations. Furthermore, researchers developed many solutions for Java while neglecting other programming languages. Recently, we created the code smell dataset for C# by following an annotation procedure inspired by the established annotation practices in Natural Language Processing. This paper evaluates Machine Learning (ML) code smell detection approaches on our novel dataset. We consider two feature representations to train ML models: (1) code metrics and (2) CodeT5 embeddings. This study is the first to consider the CodeT5 state-of-the-art neural source code embedding for code smell detection in C#. To prove the effectiveness of ML, we consider multiple metrics-based heuristics as alternatives. In our experiments, the best-performing approach was the ML classifier trained on code metrics (F-measure of 0.87 for Long Method and 0.91 for Large Class detection). However, the performance improvement over CodeT5 features is negligible if we consider the advantages of automatically inferring features. We showed that our model exceeds human performance and could be helpful to developers. To the best of our knowledge, this is the first study to compare the performance of automatic smell detectors against human performance.

show abstract

Section: Introductionmentioning

confidence: 91%

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#

Kovačević¹,

Luburić²,

Slivka³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To be qualified as a reproducible scientific study, the reported experimental results of a study should be obtained by other researchers using authors' artifacts (i.e., source code and datasets) with the same experimental setup . Some researchers pointed out the reproducibility issues in SE (Lewowski & Madeyski, 2022). Recently analyzed some studies on the use of DL models in solving a SE problem, like defect prediction or code clone detection.…”

Section: Reproducibility Packagementioning

confidence: 99%

“…Thus, we examined whether the authors of primary studies on SDP using DL publish reproduction packages for their studies. We used the categories used by Lewowski & Madeyski (2022) during data extraction. Figure 12 shows the results on the presence of a reproducibility package in the primary studies in our paper pool.…”

Section: Reproducibility Packagementioning

confidence: 99%

On the use of deep learning in software defect prediction

Giray¹,

Bennin

Köksal

et al. 2023

Journal of Systems and Software

View full text Add to dashboard Cite

“…There are two classical coefficients to measure the correlation between indicators: Spearman rank correlation coefficient and Pearson correlation coefficient 28 . The Pearson correlation coefficient has two restrictions: (1) the data obey the normal distribution; (2) the data units are consistent, and the zeros are relative, not absolute. If the measured metrics do not meet the Pearson conditions, it is necessary to consider the Spearman rank correlation coefficient.…”

Section: Spearman Rank Correlation Coefficient 27mentioning

confidence: 99%

“…The security and quality assurance of Android apps become crucial and vital to keep appealing and adapting to new devices. As a metric indicating the sub-optimal design, smells are the main culprit 1,2 .…”

Section: Introductionmentioning

confidence: 99%

Security-based code smell definition, detection, and impact quantification in Android

Shi

Zhong

et al. 2022

Preprint

View full text Add to dashboard Cite

Android occupies a high market share, and its broad functions make Android security matter. Research reveals that many security issues are caused by insecure coding practices. As a poor design indicator, code smell threatens the safety and quality assurance of Android applications (apps). Although previous works revealed specific problems associated with code smells, the field still lacks research reflecting Android features. Moreover, the cost and time limit developers to repairing numerous smells timely. We conducted a study, including definition, detection, and impact quantification for Android code smell (DefDIQ): (1) define 15 novel code smells in Android from a security programming perspective; meanwhile, we provide suggestions on how to eliminate or mitigate them; (2) implement DACS to automatically detect the custom code smells based on ASTs; (3) investigate the correlation between individual smells with DACS detection results, and select suitable code smells to construct fault counting models, then quantify their impact on quality, and thereby generating code smell repair priorities. We conducted experiments on 4,575 open-source apps, and the findings are: (i) Lin’s CCC between DACS and manual detection results reaches 0.9994, verifying the validity; (ii) the fault counting model constructed by ZINB is superior to NB (AIC = 517.32, BIC = 522.12); some smells do indicate fault-proneness, and we identify such avoidable poor designs; (iii) different code smells have different importance and the repair priorities constructed provide a practical guideline for researchers and inexperienced developers.

show abstract

How far are we from reproducible research on code smell detection? A systematic literature review

Cited by 30 publications

References 35 publications

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#

On the use of deep learning in software defect prediction

Security-based code smell definition, detection, and impact quantification in Android

Contact Info

Product

Resources

About