Characterizing and Mitigating Self-Admitted Technical Debt in Build Systems

Xiao, Tao; Wang, Dong; McIntosh, Shane; Hata, Hideaki; Kula, Raula Gaikovina; Ishio, Takashi; Matsumoto, Kenichi

doi:10.1109/tse.2021.3115772

Cited by 16 publications

(16 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, we retrieve 2,628,919 comments from build specification files in total. Similar to our prior work (Xiao et al, 2022), we identify SATD comments using the keywords-based approach of Potdar and Shihab (2014). We further discuss the rationale of electing the keywords-based approach instead of the existing machine learning based approach in Section 5.2.…”

Section: Satd Comments Extractionmentioning

confidence: 99%

“…Moreover, to reduce the impact of noisy text in comments, we remove special characters by using the regular expression [^A-Za-z0-9]+. Since stop words (e.g., "for" and "until") could convey critical semantics in the context of SATD comments, we opt to exclude stop word removal (Maipradit et al, 2020b;Xiao et al, 2022). We further filter out uninformative SATD comments that contain a single word (e.g., "TODO") since these annotations are highly likely to be cloned in the software development process.…”

Section: Satd Clones Identificationmentioning

confidence: 99%

See 1 more Smart Citation

Quantifying and characterizing clones of self-admitted technical debt in build systems

Xiao,

Zeng,

Wang

et al. 2024

Empir Software Eng

Self Cite

View full text Add to dashboard Cite

Self-Admitted Technical Debt (SATD) annotates development decisions that intentionally exchange long-term software artifact quality for short-term goals. Recent work explores the existence of SATD clones (duplicate or near duplicate SATD comments) in source code. Cloning of SATD in build systems (e.g., CMake and Maven) may propagate suboptimal design choices, threatening qualities of the build system that stakeholders rely upon (e.g., maintainability, reliability, repeatability). Hence, we conduct a large-scale study on 50,608 SATD comments extracted from Autotools, CMake, Maven, and Ant build systems to investigate the prevalence of SATD clones and to characterize their incidences. We observe that: (i) prior work suggests that 41-65% of SATD comments in source code are clones, but in our studied build system context, the rates range from 62% to 95%, suggesting that SATD clones are a more prevalent phenomenon in build systems than in source code; (ii) statements surrounding SATD clones are highly similar, with 76% of occurrences having similarity scores greater than 0.8; (iii) a quarter of SATD clones are introduced by the author of the original SATD statements; and (iv) among the most commonly

show abstract

Section: Satd Comments Extractionmentioning

confidence: 99%

Section: Satd Clones Identificationmentioning

confidence: 99%

Quantifying and characterizing clones of self-admitted technical debt in build systems

Xiao,

Zeng,

Wang

et al. 2024

Empir Software Eng

Self Cite

View full text Add to dashboard Cite

show abstract

“…To understand the characteristics of images on Stack Overflow, we conducted a qualitative study of a statistically representative sample of all developer questions that contain at least one image in our dataset. Since images may come from different sources and contain different types of content, we adopt three dimensions of (a) the image source, (b) the image content, and (c) the purpose served by the image, which is similar to prior work [15,16]. Furthermore, to inform the tool design on whether support for images is crucial, we analyze the relationship between the image and the comprehension of the question understanding.…”

Section: (Rq1) What Are the Characteristics Of Images Used In Stack O...mentioning

confidence: 99%

Understanding the Role of Images on Stack Overflow

Wang¹,

Xiao²,

Kula³

et al. 2023

Preprint

View full text Add to dashboard Cite

Images are increasingly being shared by software developers in diverse channels including question-and-answer forums like Stack Overflow. Although prior work has pointed out that these images are meaningful and provide complementary information compared to their associated text, how images are used to support questions is empirically unknown. To address this knowledge gap, in this paper we specifically conduct an empirical study to investigate (I) the characteristics of images, (II) the extent to which images are used in different question types, and (III) the role of images on receiving answers. Our results first show that user interface is the most common image content and undesired output is the most frequent purpose for sharing images. Moreover, these images essentially facilitate the understanding of 68% of sampled questions. Second, we find that discrepancy questions are more relatively frequent compared to those without images, but there are no significant differences observed in description length in all types of questions. Third, the quantitative results statistically validate that questions with images are more likely to receive accepted answers, but do not speed up the time to receive answers. Our work demonstrates the crucial role that images play by approaching the topic from a new angle and lays the foundation for future opportunities to use images to assist in tasks like generating questions and identifying question-relatedness.

show abstract

“…To discover as complete of a list of reasons as possible, we strive for theoretical saturation (Eisenhardt, 1989) to achieve analytical generalization. Similar to the prior work (Xiao et al, 2021), we initially set our saturation criterion to 50. Then the first two authors continue to code randomly selected inconsistent comments until no new codes have been discovered for 50 consecutive comments.…”

Section: Consistency Of Emoji Sentiments (Rq4)mentioning

confidence: 99%

“…Instead, we divided all PRs that contain emoji reactions into the ones by first-time contributors and the other ones by non first-time contributors. Third, during the manual classification of reasons behind sentiment inconsistency (RQ3), we did not calculate the Kappa score as the open coding process does not require it (Hirao et al, 2019;Xiao et al, 2021). 3.…”

Section: Deviations From the Registered Reportmentioning

confidence: 99%

More than React: Investigating The Role of Emoji Reaction in GitHub Pull Requests

Teyon¹,

Xiao²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Open source software development has become more social and collaborative, evident GitHub. Since 2016, GitHub started to support more informal methods such as emoji reactions, with the goal to reduce commenting noise when reviewing any code changes to a repository. From a code review context, the extent to which emoji reactions facilitate a more efficient review process is unknown. We conduct an empirical study to mine 1,850 active repositories across seven popular languages to analyze 365,811 Pull Requests (PRs) for their emoji reactions against the review time, first-time contributors, comment intentions, and the consistency of the sentiments. Answering these four research perspectives, we first find that the number of emoji reactions has a significant correlation with the review time. Second, our results show that a PR submitted by a first-time contributor is less likely to receive emoji reactions. Third, the results reveal that the comments with an intention of information giving, are more likely to receive an emoji reaction. Fourth, we observe that only a small proportion of sentiments are not consistent between comments and emoji reactions, i.e., with 11.8% of instances being identified. In these cases, the prevalent reason is when reviewers cheer up authors that admit to a mistake, i.e., acknowledge a mistake. Apart from reducing commenting noise, our work suggests that emoji reactions play a positive role in facilitating collaborative communication during the review process.

show abstract

Characterizing and Mitigating Self-Admitted Technical Debt in Build Systems

Cited by 16 publications

References 47 publications

Quantifying and characterizing clones of self-admitted technical debt in build systems

Quantifying and characterizing clones of self-admitted technical debt in build systems

Understanding the Role of Images on Stack Overflow

More than React: Investigating The Role of Emoji Reaction in GitHub Pull Requests

Contact Info

Product

Resources

About