Leevi Rantala scite author profile

Leevi Rantala

5Publications

12Citation Statements Received

162Citation Statements Given

How they've been cited

How they cite others

124

160

Affiliations

University of Oulu

Publications

Order By: Most citations

Prevalence, Contents and Automatic Detection of KL-SATD

Rantala

Mäntylä

2020

View full text Add to dashboard Cite

When developers use different keywords such as TODO and FIXME in source code comments to describe selfadmitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.Index Terms-Natural language processing; self-admitted technical debt; data mining

show abstract

Data Balancing Improves Self-Admitted Technical Debt Detection

Sridharan

Mäntylä

Rantala

et al. 2021

View full text Add to dashboard Cite

A high imbalance exists between technical debt and non-technical debt source code comments. Such imbalance affects Self Admitted Technical Debt (SATD) detection performance, and existing literature lacks empirical evidence on the choice of balancing technique. In this work, we evaluate the impact of multiple balancing techniques, including Data level, Classifier level, and Hybrid, for SATD detection in Within-Project and Cross-Project setup. Our results show that the Data level balancing technique SMOTE or Classifier level Ensemble approaches with Random Forest or XGBoost are reasonable choices depending on whether the goal is to maximize Precision, Recall, F1, or AUC-ROC. We compared our best-performing model with the previous SATD detection benchmark (cost-sensitive Convolution Neural Network). Interestingly the top-performing XGBoost with SMOTE sampling improved the Within-project F1 score by 10% but fell short in Cross-Project set up by 9%. This supports the higher generalization capability of deep learning in Crossproject SATD detection, yet while working within individual projects, classical machine learning algorithms can deliver better performance. We also evaluate and quantify the impact of duplicate source code comments in SATD detection performance. Finally, we employ SHAP and discuss the interpreted SATD features. We have included the replication package 1 and shared a web-based SATD prediction tool 2 with the balancing techniques in this study.

show abstract

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

Rantala

Mäntylä

2020

Software Qual J

View full text Add to dashboard Cite

Self-admitted technical debt refers to sub-optimal development solutions that are expressed in written code comments or commits. We reproduce and improve on a prior work by Yan et al. (2018) on detecting commits that introduce self-admitted technical debt. We use multiple natural language processing methods: Bag-of-Words, topic modeling, and word embedding vectors. We study 5 open-source projects. Our NLP approach uses logistic Lasso regression from Glmnet to automatically select best predictor words. A manually labeled dataset from prior work that identified self-admitted technical debt from code level commits serves as ground truth. Our approach achieves + 0.15 better area under the ROC curve performance than a prior work, when comparing only commit message features, and + 0.03 better result overall when replacing manually selected features with automatically selected words. In both cases, the improvement was statistically significant (p < 0.0001). Our work has four main contributions, which are comparing different NLP techniques for SATD detection, improved results over previous work, showing how to generate generalizable predictor words when using multiple repositories, and producing a list of words correlating with SATD. As a concrete result, we release a list of the predictor words that correlate positively with SATD, as well as our used datasets and scripts to enable replication studies and to aid in the creation of future classifiers.

show abstract

SoCCMiner

Sridharan

Mäntylä

Claes

et al. 2022

View full text Add to dashboard Cite

Data Balancing Improves Self-Admitted Technical Debt Detection

Sridharan¹,

Mäntylä²,

Rantala³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Leevi Rantala

Prevalence, Contents and Automatic Detection of KL-SATD

Data Balancing Improves Self-Admitted Technical Debt Detection

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

SoCCMiner

Data Balancing Improves Self-Admitted Technical Debt Detection

Contact Info

Product

Resources

About