2021
DOI: 10.48550/arxiv.2108.09847
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FRUGAL: Unlocking SSL for Software Analytics

Huy Tu,
Tim Menzies

Abstract: Standard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsupervised Learning is a promising direction to learn hidden patterns within unlabelled data, which has only been extensively studied in defect prediction. Nevertheless, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 56 publications
(115 reference statements)
0
1
0
Order By: Relevance
“…On the other hand, automated mining for labels is far more likely to meet the demand for data quantity, however at the cost of introducing noise in the form of both false positives and false negatives [8,53]. Hence, collecting large amount of good quality labelled data can pose a significant challenge for many important software engineering problems and tasks, in particular those that require single project data (i.e., within project) [50]. A recent approach to address this problem is to use transfer learning, i.e., pre-training a model with unsupervised learning on a large, general corpus, followed by fine-tuning via supervised learning towards the downstream task.…”
mentioning
confidence: 99%
“…On the other hand, automated mining for labels is far more likely to meet the demand for data quantity, however at the cost of introducing noise in the form of both false positives and false negatives [8,53]. Hence, collecting large amount of good quality labelled data can pose a significant challenge for many important software engineering problems and tasks, in particular those that require single project data (i.e., within project) [50]. A recent approach to address this problem is to use transfer learning, i.e., pre-training a model with unsupervised learning on a large, general corpus, followed by fine-tuning via supervised learning towards the downstream task.…”
mentioning
confidence: 99%