In May 2011, an outbreak of enterohemorrhagic Escherichia coli (EHEC) occurred in northern Germany. The Shiga toxin-producing strain O104:H4 infected several thousand people, frequently leading to haemolytic uremic syndrome (HUS) and gastroenteritis (GI). First reports about the outbreak appeared in the German media on Saturday 21st of May 2011; the media attention rose to high levels in the following two weeks, with up to 2000 articles categorized per day by the automatic threat detection system MedISys (Medical Information System). In this article, we illustrate how MedISys detected the sudden increase in reporting on E. coli on 21st of May and how automatic analysis of the reporting provided epidemic intelligence information to follow the event. Categorization, filtering and clustering allowed identifying different aspects within the unfolding news event, analyzing general media and official sites in parallel.
The Medical Information System (MedISys) is a fully automatic 24/7 public health surveillance system monitoring human and animal infectious diseases and chemical, biological, radiological and nuclear (CBRN) threats in open-source media. In this article, we explain the technology behind MedISys, describing the processing chain from the definition of news sources, scraping and grabbing articles from the internet, text mining, event extraction with the Pattern-based Understanding and Learning System (PULS, developed by the University of Helsinki), news clustering and alerting, to the display of results. The web interface and service applications are shown from a user’s perspective. Users can display world maps in which event locations are highlighted as well as statistics on the reporting about diseases, countries and combinations thereof and can apply filters for language, disease or location or filters with orthogonal categories, e.g. outbreaks, via their browser. Specific entities such as persons, organizations and locations are identified automatically.
With the rapid spread of the COVID-19 pandemic, the novel Meaningful Integration of Data Analytics and Services (MIDAS) platform quickly demonstrates its value, relevance and transferability to this new global crisis. The MIDAS platform enables the connection of a large number of isolated heterogeneous data sources, and combines rich datasets including open and social data, ingesting and preparing these for the application of analytics, monitoring and research tools. These platforms will assist public health authorities in: (i) better understanding the disease and its impact; (ii) monitoring the different aspects of the evolution of the pandemic across a diverse range of groups; (iii) contributing to improved resilience against the impacts of this global crisis; and (iv) enhancing preparedness for future public health emergencies. The model of governance and ethical review, incorporated and defined within MIDAS, also addresses the complex privacy and ethical issues that the developing pandemic has highlighted, allowing oversight and scrutiny of more and richer data sources by users of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.