2019 26th Asia-Pacific Software Engineering Conference (APSEC) 2019
DOI: 10.1109/apsec48747.2019.00050
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Classifying Self-Admitted Technical Debt Using N-Gram IDF

Abstract: Technical Debt (TD) introduces a quality problem and increases maintenance cost since it may require improvements in the future. Several studies show that it is possible to automatically detect TD from source code comments that developers intentionally created, so-called self-admitted technical debt (SATD). Those studies proposed to use binary classification technique to predict whether a comment shows SATD. However, SATD has different types (e.g. design SATD and requirement SATD). In this paper, we therefore … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 37 publications
0
7
0
Order By: Relevance
“…-Survey more advanced feature engineering in the active learning strategy for finding the rest of SATDs. For example, explore N-gram patterns [72] and word embeddings with deep neural networks [19]. -Explore other sampling techniques to help with unbalanced class data (one of the key characteristics for SATDs [55]).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…-Survey more advanced feature engineering in the active learning strategy for finding the rest of SATDs. For example, explore N-gram patterns [72] and word embeddings with deep neural networks [19]. -Explore other sampling techniques to help with unbalanced class data (one of the key characteristics for SATDs [55]).…”
Section: Discussionmentioning
confidence: 99%
“…Recently, some studies explore different feature engineering for identifying SATDs, e.g. Wattanakriengkrai et al [72] applied N-gram IDF as features, and Flisar and Podgorelec [19] explored how feature selection with word embedding can help the prediction. The latest progress are from Wang et al [71]'s HATD and Ren et al [55]'s tuned CNN utilized a deep convolutional neural network to achieve a higher F1 score than all the previous solutions.…”
Section: Automatic Labelingmentioning
confidence: 99%
“…One of the threats to construct validity in the study concerns the potentially different interpretations of discussed topics between interviewees and researchers. Because we focus on SATD in this study and most Code Comments [6], [7], [12], [14], [15], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66] Issue Trackers [3], [12], [16] Commit Messages [12] Pull Requests [12] Automated Differentiation Between Fixed and Unfixed SATD -Automated Tracing Between SATD in Different Sources [11], [12], [36], [37] and Code and Related Development Tasks -Automated SATD Prioritization [9], [67], …”
Section: Threats To Validity 61 Construct Validitymentioning
confidence: 99%
“…2) BOW (Bag Of Words): one way of extracting variables from text into numbers by representing textual documents as sparse vectors of word counts [34]. 3) N-Gram: a text preprocessing model that has a method to improve character transformations.…”
Section: Variable Extractionmentioning
confidence: 99%