Background
To investigate the impacts of the COVID-19 pandemic on the health workforce, we aimed to develop a framework that synergizes natural language processing (NLP) techniques and human-generated analysis to reduce, organize, classify, and analyze a vast volume of publicly available news articles to complement scientific literature and support strategic policy dialogue, advocacy, and decision-making.
Objective
This study aimed to explore the possibility of systematically scanning intelligence from media that are usually not captured or best gathered through structured academic channels and inform on the impacts of the COVID-19 pandemic on the health workforce, contributing factors to the pervasiveness of the impacts, and policy responses, as depicted in publicly available news articles. Our focus was to investigate the impacts of the COVID-19 pandemic and, concurrently, assess the feasibility of gathering health workforce insights from open sources rapidly.
Methods
We conducted an NLP-assisted media content analysis of open-source news coverage on the COVID-19 pandemic published between January 2020 and June 2022. A data set of 3,299,158 English news articles on the COVID-19 pandemic was extracted from the World Health Organization Epidemic Intelligence through Open Sources (EIOS) system. The data preparation phase included developing rules-based classification, fine-tuning an NLP summarization model, and further data processing. Following relevancy evaluation, a deductive-inductive approach was used for the analysis of the summarizations. This included data extraction, inductive coding, and theme grouping.
Results
After processing and classifying the initial data set comprising 3,299,158 news articles and reports, a data set of 5131 articles with 3,007,693 words was devised. The NLP summarization model allowed for a reduction in the length of each article resulting in 496,209 words that facilitated agile analysis performed by humans. Media content analysis yielded results in 3 sections: areas of COVID-19 impacts and their pervasiveness, contributing factors to COVID-19–related impacts, and responses to the impacts. The results suggest that insufficient remuneration and compensation packages have been key disruptors for the health workforce during the COVID-19 pandemic, leading to industrial actions and mental health burdens. Shortages of personal protective equipment and occupational risks have increased infection and death risks, particularly at the pandemic’s onset. Workload and staff shortages became a growing disruption as the pandemic progressed.
Conclusions
This study demonstrates the capacity of artificial intelligence–assisted media content analysis applied to open-source news articles and reports concerning the health workforce. Adequate remuneration packages and personal protective equipment supplies should be prioritized as preventive measures to reduce the initial impact of future pandemics on the health workforce. Interventions aimed at lessening the emotional toll and workload need to be formulated as a part of reactive measures, enhancing the efficiency and maintainability of health delivery during a pandemic.