Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Lin, Jinfeng; Liu, Yalin; Zeng, Qingkai; Jiang, Meng; Cleland‐Huang, Jane

doi:10.1109/icse43902.2021.00040

Cited by 85 publications

(72 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another alternative is to define suitable intermediate training tasks. We have found initial evidence of this, and a recent paper adds further evidence, in the context of traceability [83]. However, it used a very similar task and dataset.…”

Section: Intermediate Task Trainingmentioning

confidence: 57%

Making the most of small Software Engineering datasets with modern machine learning

Prenner¹,

Robbes²

2021

Preprint

View full text Add to dashboard Cite

This paper provides a starting point for Software Engineering (SE) researchers and practitioners faced with the problem of training machine learning models on small datasets. Due to the high costs associated with labeling data, in Software Engineering, there exist many small (< 1 000 samples) and medium-sized (< 100 000 samples) datasets. While deep learning has set the state of the art in many machine learning tasks, it is only recently that it has proven effective on small-sized datasets, primarily thanks to pre-training, a semi-supervised learning technique that leverages abundant unlabelled data alongside scarce labelled data. In this work, we evaluate pre-trained Transformer models on a selection of 13 smaller datasets from the SE literature, covering both, source code and natural language. Our results suggest that pre-trained Transformers are competitive and in some cases superior to previous models, especially for tasks involving natural language; whereas for source code tasks, in particular for very small datasets, traditional machine learning methods often has the edge. In addition, we experiment with several techniques that ought to aid training on small datasets, including active learning, data augmentation, soft labels, self-training and intermediate-task fine-tuning, and issue recommendations on when they are effective. We also release all the data, scripts, and most importantly pre-trained models for the community to reuse on their own datasets.

show abstract

Section: Intermediate Task Trainingmentioning

confidence: 57%

Making the most of small Software Engineering datasets with modern machine learning

Prenner¹,

Robbes²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Outside of code authoring, transfer learning has been enabled by Transformer-based models in other software artifacts. Lin et al [24] apply pretrained BERT models for learning relationships between issues and commits in a software repos-itory. Sharma et al [25] detect code smells in programming languages where sufficient training data is not available by transferring knowledge from other data-rich languages.…”

Section: Resultsmentioning

confidence: 99%

Improving Code Autocompletion with Transfer Learning

Zhou¹,

Kim²,

Murali³

et al. 2021

Preprint

View full text Add to dashboard Cite

Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integrations. Recently, accuracy in autocompletion prediction improved 12.8% [1] from training on a real-world dataset collected from programmers' IDE activity. But what if limited examples of IDE autocompletion in the target programming language are available for model training?In this paper, we investigate the efficacy of pretraining autocompletion models on non-IDE, non-autocompletion, and different-language example code sequences. We find that these unsupervised pretrainings improve model accuracy by over 50% on very small fine-tuning datasets and over 10% on 50k labeled examples. We confirm the real-world impact of these pretrainings in an online setting through A/B testing on thousands of IDE autocompletion users, finding that pretraining is responsible for increases of up to 6.63% autocompletion usage.

show abstract

“…In the future, we plan to consider more diverse information of a post into account, such as the attached pictures, author information, etc. Also, we are interested in applying PTM4Tag on more SQA sites such as Freecode 14 , AskUbuntu 15 , etc., to further evaluate its effectiveness and generalizability. We release our replication package 16 to facilitate future research.…”

Section: Discussionmentioning

confidence: 99%

“…Recent trends in the NLP domain have led to the rapid development of transfer learning. Especially, substantial work has shown that pre-trained language models learn practical and generic language representations which could achieve outstanding performance in various downstream tasks simply by fine-tuning, i.e., without training a new model from scratch [10,15,20]. With proper training manner, the model can effectively capture the semantics of individual words based on their surrounding context and reflect the meaning of the whole sentence.…”

Section: Pre-trained Language Modelsmentioning

confidence: 99%

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

He,

Xu,

Yang

et al. 2022

Preprint

View full text Add to dashboard Cite

Stack Overflow is often viewed as one of the most influential Software Question & Answer (SQA) websites, containing millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which raises problems like tag synonym and tag explosion. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above.Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging CodeBERT, a software engineering (SE) domain-specific PTM in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art Convolutional Neural Network-based approach by a large margin in terms of average 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑘, 𝑅𝑒𝑐𝑎𝑙𝑙@𝑘, and 𝐹 1-𝑠𝑐𝑜𝑟𝑒@𝑘. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in

show abstract

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Cited by 85 publications

References 27 publications

Making the most of small Software Engineering datasets with modern machine learning

Making the most of small Software Engineering datasets with modern machine learning

Improving Code Autocompletion with Transfer Learning

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Contact Info

Product

Resources

About