In this work we address the issue of generic automated disease incidence monitoring on twitter. We employ an ontology of disease related concepts and use it to obtain a conceptual representation of tweets. Unlike previous key word based systems and topic modeling approaches, our ontological approach allows us to apply more stringent criteria for determining which messages are relevant such as spatial and temporal characteristics whilst giving a stronger guarantee that the resulting models will perform well on new data that may be lexically divergent. We achieve this by training learners on concepts rather than individual words. For training we use a dataset containing mentions of influenza and Listeria and use the learned models to classify datasets containing mentions of an arbitrary selection of other diseases. We show that our ontological approach achieves good performance on this task using a variety of Natural Language Processing Techniques. We also show that word vectors can be learned directly from our concepts to achieve even better results.
Twitter and social media as a whole have great potential as a source of disease surveillance data however the general messiness of tweets presents several challenges for standard information extraction methods. Most deployed systems employ approaches that rely on simple keyword matching and do not distinguish between relevant and irrelevant keyword mentions making them susceptible to false positives as a result of the fact that keyword volume can be influenced by several social phenomena that may be unrelated to disease occurrence. Furthermore, most solutions are intended for a single language and those meant for multilingual scenarios do not incorporate semantic context. In this paper we experimentally examine different approaches for classifying text for epidemiological surveillance on the social web in addition we offer a systematic comparison of the impact of different input representations on performance. Specifically we compare continuous representations against one-hot encoding for word-based, class-based (ontology-based) and subword units in the form of byte pair encodings. We also go on to establish the desirable performance characteristics for multi-lingual semantic filtering approaches and offer an in-depth discussion of the implications for end-to-end surveillance.
Typhoid disease continues to be a global public health burden. Uganda is one of the African countries characterized by high incidences of typhoid disease. Over 80% of the Ugandan districts are endemic for typhoid, largely attributable to lack of reliable knowledge to support disease surveillance. Spatial-temporal studies exploring major characteristics of the disease within the local population have remained limited in Uganda. The main goal of the study was to reveal spatial-temporal trends and distribution patterns of typhoid disease in Uganda for the period 2012 to 2017. Spatial-temporal statistics revealed monthly and annual trends of the disease at both regional and national levels. Results show that outbreaks occurred during 2015 and 2017 in central and eastern regions, respectively. Spatial scan statistic using the discrete Poisson model revealed spatial clusters of the disease for each of the years from 2012 to 2017, together with populations at risk. Most of the disease clustering was in the central region, followed by western and eastern regions (P <0.01). The northern region was the safest throughout the study period. This knowledge helps surveillance teams to i) plan and enforce preventive measures; ii) effectively prepare for outbreaks; iii) make targeted interventions for resource optimization; and iv) evaluate effectiveness of the intervention methods in the study period. This exploratory research forms a foundation of using Geographical Information Systems (GIS) in other related subsequent research studies to discover hidden spatial patterns that are difficult to discover with conventional methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.