Natural language processing in mining unstructured data from software repositories: a review

Gupta, Som

doi:10.1007/s12046-019-1223-9

Cited by 12 publications

(2 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A more recent review was done, broadly focusing on adopting NLP to mine unstructured data in software repositories (Gupta and Gupta, 2019). The review was done by looking into general applications of mining repositories, with a sub-focus on traceability efforts.…”

Section: Related Workmentioning

confidence: 99%

Applications of Natural Language Processing in Software Traceability: A Systematic Mapping Study

Pauzi

Capiluppi

2022

SSRN Journal

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Applications of Natural Language Processing in Software Traceability: A Systematic Mapping Study

Pauzi

Capiluppi

2022

SSRN Journal

View full text Add to dashboard Cite

“…These are neural networks that have been trained on large amounts of text data. They can be tuned to perform well on specific tasks, such as sentiment analysis [3], [4].…”

Section: Introductionmentioning

confidence: 99%

Fine-tuning Pretrained Transformers for Sentiment Analysis on Twitter Data

Anupriya¹

2021

MSEA

View full text Add to dashboard Cite

Due to the noise and informal language present in Twitter data, it is difficult to perform sentiment analysis on the platform. In recent years, a number of transformer models have been developed that can perform well in this type of task. This study aims to analyze the performance of these models on Twitter data. The study utilizes a publicly-available dataset of tweets with neutral, positive, or negative sentiment. It preprocesses the data and tokenizes it using WordPiece. Three transformer models are then tuned using the labeled tweets' pre-defined weights and the models' training weights from large language modeling projects. The models are trained on a 5-phased scale. The three models' performance was evaluated using various metrics, such as accuracy, recall, and F1 score. The results show that the models performed well overall, with ELECTRA leading the way with an accuracy of 85.8%, followed by XLNet and BERT with 84.3% and 84.5% accuracy, respectively. The study also looked into the hyperparameters' impact on the performance. It revealed that batch sizes and learning rates have a significant effect on the models' performance. The results indicate that the models performed better with larger batch sizes and lower learning rates. The study concluded that the three pre-trained transformer models, namely XLNet, ELECTRA, and BERT, were able to perform well in terms of their performance when it came to analyzing Twitter data. Their findings can be beneficial for those working in the field of sentiment analysis on social media platforms.

show abstract