BackgroundHealth care social media used for health information exchange and emotional communication involves different types of users, including patients, caregivers, and health professionals. However, it is difficult to identify different stakeholders because user identification data are lacking due to privacy protection and proprietary interests. Therefore, identifying the concerns of different stakeholders and how they use health care social media when confronted with huge amounts of health-related messages posted by users is a critical problem.ObjectiveWe aimed to develop a new content analysis method using text mining techniques applied in health care social media to (1) identify different health care stakeholders, (2) determine hot topics of concern, and (3) measure sentiment expression by different stakeholders.MethodsWe collected 138,161 messages posted by 39,606 members in lung cancer, diabetes, and breast cancer forums in the online community MedHelp.org over 10 years (January 2007 to October 2016) as experimental data. We used text mining techniques to process text data to identify different stakeholders and determine health-related hot topics, and then analyzed sentiment expression.ResultsWe identified 3 significantly different stakeholder groups using expectation maximization clustering (3 performance metrics: Rand=0.802, Jaccard=0.393, Fowlkes-Mallows=0.537; P<.001), in which patients (24,429/39,606, 61.68%) and caregivers (12,232/39,606, 30.88%) represented the majority of the population, in contrast to specialists (2945/39,606, 7.43%). We identified 5 significantly different health-related topics: symptom, examination, drug, procedure, and complication (Rand=0.783, Jaccard=0.369, Fowlkes-Mallows=0.495; P<.001). Patients were concerned most about symptom topics related to lung cancer (536/1657, 32.34%), drug topics related to diabetes (1883/5904, 31.89%), and examination topics related to breast cancer (8728/23,934, 36.47%). By comparison, caregivers were more concerned about drug topics related to lung cancer (300/2721, 11.03% vs 109/1657, 6.58%), procedure topics related to breast cancer (3952/13,954, 28.32% vs 5822/23,934, 24.33%), and complication topics (4449/25,701, 17.31% vs 4070/31,495, 12.92%). In addition, patients (9040/36,081, 25.05%) were more likely than caregivers (2659/18,470, 14.39%) and specialists (17,943/83,610, 21.46%) to express their emotions. However, patients’ sentiment intensity score (2.46) was lower than those of caregivers (4.66) and specialists (5.14). In particular, for caregivers, negative sentiment scores were higher than positive scores (2.56 vs 2.18), with the opposite among specialists (2.62 vs 2.46). Overall, the proportion of negative messages was greater than that of positive messages related to symptom, complication, and examination. The pattern was opposite for drug and procedure topics. A trend analysis showed that patients and caregivers gradually changed their emotional state in a positive direction.ConclusionsThe hot topics of interest and sentiment expr...