2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) 2021
DOI: 10.1109/msr52588.2021.00022
|View full text |Cite
|
Sign up to set email alerts
|

Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…Note that this study is an extension of our prior paper (Flint et al, 2021a). Compared to the previous paper, this work adds the following additional contributions:…”
Section: Introductionmentioning
confidence: 91%
“…Note that this study is an extension of our prior paper (Flint et al, 2021a). Compared to the previous paper, this work adds the following additional contributions:…”
Section: Introductionmentioning
confidence: 91%
“…There is some chance that the author timestamps used to chronologically order the data contain errors resulting from, e.g., misconfigured clocks or problems when converting old repositories into GitHub. However, such issues tend to affect a very small proportion of commits [34], being unlikely to negatively affect our analyses. An additional threat is the use of Commit Guru to collect labelled software changes.…”
Section: Threats To Validitymentioning
confidence: 98%
“…All software changes were chronologically ordered based on the author timestamp, which is provided by Git as the timestamp when the commit was first created by the author. Author timestamp is recommended over commiter timestamp for studies involving time-based Git data [34]. This timestamp is used in our paper as the moment in time when a JIT-SDP model is required to provide a prediction for this software change.…”
Section: Datasetsmentioning
confidence: 99%
“…Falessi et al (2020) reports on the importance of preserving the order of data between the training and testing set. Afterward, the same issue was deeply discussed in Flint et al (2021) Thus, results are unrealistic if the underlying evaluation does not preserve the order of data. Falessi et al (2022) show that dormant defects impact classifiers' accuracy and hence its evaluation.…”
Section: Evaluationsmentioning
confidence: 99%