Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews

Hadi, M. A.; Fard, Fatemeh H.

doi:10.48550/arxiv.2104.05861

Cited by 3 publications

(6 citation statements)

References 44 publications

(93 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among these three, only RRGEN model is available. There are a few studies in software engineering that evaluate the capability of PTMs for sentiment analysis [7], user feedback analysis [8], and programming and natural language tasks [9]. Our work is different with these studies as we are the first to investigate the application of pre-trained language models and Transformers for app review response generation.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Pre-Trained Neural Language Models for Automatic Mobile App User Feedback Answer Generation

Cao

Fard

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Studies show that developers' answers to the mobile app users' feedbacks on app stores can increase the apps' star rating. To help app developers generate answers that are related to the users' issues, recent studies develop models to generate the answers automatically. Aims: The app response generation models use deep neural networks and require training data. Pre-Trained neural language Models (PTM) used in Natural Language Processing (NLP) take advantage of the information they learned from a large corpora in an unsupervised manner, and can reduce the amount of required training data. In this paper, we evaluate PTMs to generate replies to the mobile app user feedbacks. Method: We train a Transformer model from scratch and fine tune two PTMs to evaluate the generated responses, which are compared to RRGEN, a current app response model. We also evaluate the models with different portions of the training data. Results: The results on a large dataset evaluated by automatic metrics show that PTMs obtain lower scores than the baselines. However, our human evaluation confirm that PTMs can generate more relevant and meaningful responses to the posted feedbacks. Moreover, the performance of PTMs has less drop compared to other model when the amount of training data is reduced to 1/3. Conclusion: PTMs are useful in generating responses to app reviews and are more robust models to the amount of training data provided. However, the prediction time is 19X than RRGEN. This study can provide new avenues for research in adapting the PTMs for analyzing mobile app user feedbacks.Index Terms-mobile app user feedback analysis, neural pretrained language models, automatic answer generation This research is supported by a fund from Natural Sciences and Engineering Research Council of Canada RGPIN-2019-05175.1 https://bit.ly/3EYvxv0 2 https://bit.ly/3iz7vNJ 3 https://bit.ly/2Y6raxi

show abstract

Section: Related Workmentioning

confidence: 99%

“…The advantages of using PTMs in software engineering are explored for sentiment classification and code-related tasks (e.g. comment generation) [7,8,9]. However, there is no study that evaluates their performance for app review response generation.…”

Section: Introductionmentioning

confidence: 99%

Pre-Trained Neural Language Models for Automatic Mobile App User Feedback Answer Generation

Cao

Fard

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly, Henao et al demonstrated the increase in performance in user feedback classification when using pre-trained language models over both classical models as well as other deep models [16]. Hadi and Fard proposed a study where the classification accuracy of pre-trained language models is compared against that of previously constructed classifiers from the literature as well as exploring the effect of self-supervised pre-training, binary classification, multi-class classification, and zero-shot settings on classification performance [15]. Dhinakaran et al showed that models trained on training data that was chosen randomly were found to consistently underperform more sophisticated training data selection techniques, such as active learning [10].…”

Section: Comparisons Of Classification Techniquesmentioning

confidence: 99%

“…2. For further context, a zero-shot text classification model (denoted "Zero shot"), as was proposed by Hadi and Fard [15], was also evaluated on each dataset to provide a performance benchmark.…”

Section: Unseen Datasets (Rq2)mentioning

confidence: 99%

See 1 more Smart Citation

Evaluating Software User Feedback Classifiers on Unseen Apps, Datasets, and Metadata

Devine¹,

Koh²,

Blincoe³

2021

Preprint

View full text Add to dashboard Cite

Listening to user's requirements is crucial to building and maintaining high quality software. Online software user feedback has been shown to contain large amounts of information useful to requirements engineering (RE). Previous studies have created machine learning classifiers for parsing this feedback for development insight. While these classifiers report generally good performance when evaluated on a test set, questions remain as to how well they extend to unseen data in various forms.This study evaluates machine learning classifiers performance on feedback for two common classification tasks (classifying bug reports and feature requests). Using seven datasets from prior research studies, we investigate the performance of classifiers when evaluated on feedback from different apps than those contained in the training set and when evaluated on completely different datasets (coming from different feedback platforms and/or labelled by different researchers). We also measure the difference in performance of using platform-specific metadata as a feature in classification.We demonstrate that classification performance is similar on feedback from unseen apps compared to seen apps in the majority of cases tested. However, the classifiers do not perform well on unseen datasets. We show that multidataset training or zero shot classification approaches can somewhat mitigate this performance decrease. Finally, we find that using metadata as features in classifying bug reports and feature requests does not lead to a statistically significant improvement in the majority of datasets tested. We discuss the implications of these results on developing user feedback classification models to analyse and extract software requirements.

show abstract

Deciphering Voice of the Customer using Text Analytics and Sentiment Analysis: An Interpretable Review Rating Prediction using RoBERTa

U.C.

N.M.

2022

IJRITCC

View full text Add to dashboard Cite

In this era of cut-throat competition among traditional and newer digital organizations, capturing, listening, and understanding customer voices are critical for success in the marketplace. The challenge to decipher the voice of the customer (VOC) has multiplied many times today, as now the number of customer reviews are present in multiple platforms and the data to be analyzed is huge. Sentiment analysis, and text analytics using machine learning, deep learning tools and transformer-based tools can be applied to gather meaningful insights from these data. This paper applies the traditional machine learning tools of the Naive Bayes classifier, Random Forest and AdaBoost to predict the customer review ratings. These results are compared with deep learning methods of CNN, RNN and Bi-LSTM and transformer-based approaches of BERT, DistilBERT and RoBERTa. The results show that RoBERTa has the highest accuracy among these methods. Paper also uses the explainable AI tool of LIME to understand the customer sentiments deeper in terms of why customers are giving a particular rating to the product. Business organizations will continue to use more and more AI tools to understand the customer feedback and the attempt in this paper is to learn how we can make predictions faster and more accurately.

show abstract

Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews

Cited by 3 publications

References 44 publications

Pre-Trained Neural Language Models for Automatic Mobile App User Feedback Answer Generation

Pre-Trained Neural Language Models for Automatic Mobile App User Feedback Answer Generation

Evaluating Software User Feedback Classifiers on Unseen Apps, Datasets, and Metadata

Deciphering Voice of the Customer using Text Analytics and Sentiment Analysis: An Interpretable Review Rating Prediction using RoBERTa

Contact Info

Product

Resources

About