Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Background Older adults have a high rate of loneliness, which contributes to increased psychosocial risk, medical morbidity, and mortality. Digital emotional support interventions provide a convenient and rapid avenue for additional support. Digital peer support interventions for emotional struggles contrast the usual provider-based clinical care models because they offer more accessible, direct support for empowerment, highlighting the users’ autonomy, competence, and relatedness. Objective This study aims to examine a novel anonymous and synchronous peer-to-peer digital chat service facilitated by trained human moderators. The experience of a cohort of 699 adults aged ≥65 years was analyzed to determine (1) if participation, alone, led to measurable aggregate change in momentary loneliness and optimism and (2) the impact of peers on momentary loneliness and optimism. Methods Participants were each prompted with a single question: “What’s your struggle?” Using a proprietary artificial intelligence model, the free-text response automatched the respondent based on their self-expressed emotional struggle to peers and a chat moderator. Exchanged messages were analyzed to quantitatively measure the change in momentary loneliness and optimism using a third-party, public, natural language processing model (GPT-4 [OpenAI]). The sentiment change analysis was initially performed at the individual level and then averaged across all users with similar emotion types to produce a statistically significant (P<.05) collective trend per emotion. To evaluate the peer impact on momentary loneliness and optimism, we performed propensity matching to align the moderator+single user and moderator+small group chat cohorts and then compare the emotion trends between the matched cohorts. Results Loneliness and optimism trends significantly improved after 8 (P=.02) to 9 minutes (P=.03) into the chat. We observed a significant improvement in the momentary loneliness and optimism trends between the moderator+small group compared to the moderator+single user chat cohort after 19 (P=.049) and 21 minutes (P=.04) for optimism and loneliness, respectively. Conclusions Chat-based peer support may be a viable intervention to help address momentary loneliness in older adults and present an alternative to traditional care. The promising results support the need for further study to expand the evidence for such cost-effective options.
Background Older adults have a high rate of loneliness, which contributes to increased psychosocial risk, medical morbidity, and mortality. Digital emotional support interventions provide a convenient and rapid avenue for additional support. Digital peer support interventions for emotional struggles contrast the usual provider-based clinical care models because they offer more accessible, direct support for empowerment, highlighting the users’ autonomy, competence, and relatedness. Objective This study aims to examine a novel anonymous and synchronous peer-to-peer digital chat service facilitated by trained human moderators. The experience of a cohort of 699 adults aged ≥65 years was analyzed to determine (1) if participation, alone, led to measurable aggregate change in momentary loneliness and optimism and (2) the impact of peers on momentary loneliness and optimism. Methods Participants were each prompted with a single question: “What’s your struggle?” Using a proprietary artificial intelligence model, the free-text response automatched the respondent based on their self-expressed emotional struggle to peers and a chat moderator. Exchanged messages were analyzed to quantitatively measure the change in momentary loneliness and optimism using a third-party, public, natural language processing model (GPT-4 [OpenAI]). The sentiment change analysis was initially performed at the individual level and then averaged across all users with similar emotion types to produce a statistically significant (P<.05) collective trend per emotion. To evaluate the peer impact on momentary loneliness and optimism, we performed propensity matching to align the moderator+single user and moderator+small group chat cohorts and then compare the emotion trends between the matched cohorts. Results Loneliness and optimism trends significantly improved after 8 (P=.02) to 9 minutes (P=.03) into the chat. We observed a significant improvement in the momentary loneliness and optimism trends between the moderator+small group compared to the moderator+single user chat cohort after 19 (P=.049) and 21 minutes (P=.04) for optimism and loneliness, respectively. Conclusions Chat-based peer support may be a viable intervention to help address momentary loneliness in older adults and present an alternative to traditional care. The promising results support the need for further study to expand the evidence for such cost-effective options.
BACKGROUND Large language models (LLMs) have the potential to improve the accessibility and quality of medical information for patients. Assessing the quality of LLM-generated responses in real-world clinical settings is crucial for determining their suitability and optimizing healthcare efficiency. OBJECTIVE This study aims to comprehensively evaluate the reliability of responses generated by an LLM-driven chatbot compared to those written by physicians, demonstrating that artificial intelligence (AI) can enhance the quality of otorhinolaryngological advice in complex, nuanced text-based workflows. METHODS Inquiries and verified physician responses related to otorhinolaryngology posted on a public social media forum between December 20 and 21, 2023, were extracted and anonymized. ChatGPT-4 was tasked with generating responses to each inquiry. A panel of seven board-certified otorhinolaryngologists evaluated both physician and ChatGPT-4 responses in a masked, randomized manner. The responses were assessed based on six criteria: overall quality, empathy, alignment with medical consensus, accuracy or appropriateness of information, inquiry comprehension, and potential harm. Logistic regression analysis was employed to identify predictors of preference for ChatGPT-4 responses and their influence on overall quality. RESULTS A total of 60 question–response pairs were included in the analysis. ChatGPT-4 responses were significantly longer (median: 162 words) compared to physician responses (median: 67 words; p<.0001). The expert panel preferred ChatGPT-4-generated responses in 70.7% of cases. ChatGPT-4 responses were rated higher across all six criteria. Multivariate analysis identified significant predictors of preference for ChatGPT-4 responses: alignment with medical consensus (odds ratio [OR]: 2.783), incorrect or inappropriate information (OR: 2.540), and empathy (OR: 1.362). For physician responses, alignment with medical consensus (OR: 1.477), empathy (OR: 1.089), inquiry comprehension (OR: 0.529), and word count (OR: 0.007) positively impacted overall quality. For chatbot responses, empathy (OR: 1.209), information appropriateness (OR: 0.903), and alignment with medical consensus (OR: 0.768) were significantly associated with high-quality ratings. CONCLUSIONS ChatGPT-4 outperformed physicians in generating high-quality responses. Therefore, integrating AI into clinical workflows may enhance the quality of physicians’ responses by improving comprehension of complex inquiries and providing more detailed information, thereby enhancing perceived quality.
Introduction: The emergence of large language models (LLMs) has led to significant interest in their potential use as medical assistive tools. Prior investigations have analyzed the overall comparative performance of LLM versions within different ophthalmology subspecialties. However, limited investigations have characterized LLM performance on image-based questions, a recent advance in LLM capabilities. The purpose of this study was to evaluate the performance of Chat Generative Pre-Trained Transformers (ChatGPT) versions 3.5 and 4.0 on image-based and text-only questions using oculoplastic subspecialty questions from StatPearls and OphthoQuestions question banks. Methods: This study utilized 343 non-image questions from StatPearls, 127 images from StatPearls, and 89 OphthoQuestions. All of these questions were specific to Oculoplastics. The information collected included correctness, distribution of answers, and if an additional prompt was necessary. Text-only questions were compared between ChatGPT-3.5 and ChatGPT-4.0. Also, text-only and multimodal (image-based) questions answered by ChatGPT-4.0 were compared. Results: ChatGPT-3.5 answered 56.85% (195/343) of text-only questions correctly, while ChatGPT-4.0 achieved 73.46% (252/343), showing a statistically significant difference in accuracy (p<0.05). The biserial correlation between ChatGPT-3.5 and human performance on the StatPearls question bank was 0.198, with a standard deviation of 0.195. When ChatGPT-3.5 was incorrect, the average human correctness was 49.39% (SD 26.27%), and when it was correct, human correctness averaged 57.82% (SD 30.14%) with a t-statistic of 3.57 and a p-value of 0.0004. For ChatGPT-4.0, the biserial correlation was 0.226 (SD 0.213). When ChatGPT-4.0 was incorrect, human correctness averaged 45.49% (SD 24.85%), and when it was correct, human correctness was 57.02% (SD 29.75%) with a t-statistic of 4.28 and a p-value of 0.0006. On image-only questions, ChatGPT-4.0 correctly answered 56.94% (123/216), significantly lower than its performance on text-only questions (p<0.05). Discussion and conclusion: This study shows that ChatGPT-4.0 performs better on the oculoplastic subspecialty than prior versions. However, significant challenges remain regarding accuracy, particularly when integrating image-based prompts. While showing promise within medical education, further progress must be made regarding LLM reliability, and caution should be used until further advancement is achieved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.