Background In December 2019, the COVID-19 outbreak started in China and rapidly spread around the world. Lack of a vaccine or optimized intervention raised the importance of characterizing risk factors and symptoms for the early identification and successful treatment of patients with COVID-19. Objective This study aims to investigate and analyze biomedical literature and public social media data to understand the association of risk factors and symptoms with the various outcomes observed in patients with COVID-19. Methods Through semantic analysis, we collected 45 retrospective cohort studies, which evaluated 303 clinical and demographic variables across 13 different outcomes of patients with COVID-19, and 84,140 Twitter posts from 1036 COVID-19–positive users. Machine learning tools to extract biomedical information were introduced to identify mentions of uncommon or novel symptoms in tweets. We then examined and compared two data sets to expand our landscape of risk factors and symptoms related to COVID-19. Results From the biomedical literature, approximately 90% of clinical and demographic variables showed inconsistent associations with COVID-19 outcomes. Consensus analysis identified 72 risk factors that were specifically associated with individual outcomes. From the social media data, 51 symptoms were characterized and analyzed. By comparing social media data with biomedical literature, we identified 25 novel symptoms that were specifically mentioned in tweets but have been not previously well characterized. Furthermore, there were certain combinations of symptoms that were frequently mentioned together in social media. Conclusions Identified outcome-specific risk factors, symptoms, and combinations of symptoms may serve as surrogate indicators to identify patients with COVID-19 and predict their clinical outcomes in order to provide appropriate treatments.
Background In December 2019, the COVID-19 outbreak started in China and rapidly spread around the world. Many studies have been conducted to understand the clinical characteristics of COVID-19, and recently postinfection sequelae of this disease have begun to be investigated. However, there is little consensus on the longitudinal changes of lasting physical or psychological symptoms from prior COVID-19 infection. Objective This study aims to investigate and analyze public social media data from Reddit to understand the longitudinal impact of COVID-19 symptoms before and after recovery from COVID-19. Methods We collected 22,890 Reddit posts that were generated by 14,401 authors from March 14 to December 16, 2020. Using active learning and intensive manual inspection, 292 (2.03%) active authors, who were infected by COVID-19 and frequently reported disease progress on Reddit, along with their 2213 (9.67%) longitudinal posts, were identified. Machine learning tools to extract biomedical information were applied to identify COVID-19 symptoms mentioned in the Reddit posts. We then examined longitudinal changes in individual physiological and psychological characteristics before and after recovery from COVID-19 infection. Results In total, 58 physiological and 3 psychological symptoms were identified in social media before and after recovery from COVID-19 infection. From the analyses, we found that symptoms of patients with COVID-19 lasted 2.5 months. On average, symptoms appeared around a month before recovery and remained for 1.5 months after recovery. Well-known COVID-19 symptoms, such as fever, cough, and chest congestion, appeared relatively earlier in patient journeys and were frequently observed before recovery from COVID-19. Meanwhile, mental discomfort or distress, such as brain fog or stress, fatigue, and manifestations on toes or fingers, were frequently mentioned after recovery and remained as intermediate- and longer-term sequelae. Conclusions In this study, we showed the dynamic changes in COVID-19 symptoms during the infection and recovery phases of the disease. Our findings suggest the feasibility of using social media data for investigating disease states and understanding the evolution of the physiological and psychological characteristics of COVID-19 infection over time.
ABSTRACTere is growing interest in systems that generate timeline summaries by ltering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend on standardized and reproducible evaluation methodologies for comparing systems. However, timeline summary evaluation is still in its infancy, with competing methodologies currently being explored in international evaluation forums such as TREC. One area of active exploration is how to explicitly represent the units of information that should appear in a "good" summary. Currently, there are two main approaches, one based on identifying nuggets in an external "ground truth", and the other based on clustering system outputs. In this paper, by building test collections that have both nugget and cluster annotations, we are able to compare these two approaches. Speci cally, we address questions related to evaluation e ort, di erences in the nal evaluation products, and correlations between scores and rankings generated by both approaches. We summarize advantages and disadvantages of nuggets and clusters to o er recommendations for future system evaluations.
We examine the effects of expanding a judged set of sentences with their duplicates from a corpus. Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection. We perform experiments in context of the Temporal Summarization Track at TREC 2013. We find that adding duplicate sentences to the judged set does not significantly affect relative system performance. However, we do find statistically significant changes in the performance of nearly half the systems that participated in the Track. We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.