BACKGROUND
Healthcare providers and health-related researchers face significant challenges when applying sen- timent analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains like social media, and their performance in the healthcare context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results.
OBJECTIVE
This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective is to automatically predict sentence sentiment for two independent COVID-19 survey datasets from NIH and Stanford University.
METHODS
Gold-standard labels were created for a subset of each dataset using a panel of human raters. We compared eight state-of- the-art sentiment analysis tools on both datasets to evaluate variability and disagreement across tools. Additionally, few-shot learning was explored by fine-tuning OPT (a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights).
RESULTS
The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperform- ing all other sentiment analysis tools. Moreover, ChatGPT exhibited higher accuracy, outperforming OPT by 6%, and f-score by 4% to 7%.
CONCLUSIONS
The findings suggest that using LLMs is a viable method for predicting sentiment in health surveys. The comparative analysis highlights the potential of LLMs in reducing the need for human labor in dataset annotation or redeploying it toward quality control of LLM predictions. The study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in sentiment analysis of health-related survey data. These results have implications for saving hu- man labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis.