Thokozile Manaka scite author profile

As a result of the restricted access to data in healthcare facilities due to patient privacy and confidentiality policies, the application of general natural language processing (NLP) techniques has advanced relatively slowly in the health domain. Additionally, because clinical data is unique to various institutions and laboratories, there aren't enough standards and standards for data annotation. In places without robust death registration systems, the cause of death (COD) is determined through a verbal autopsy (VA) report. A non-clinician field agent completes a VA report using a set of predefined questions as a guidance in order to identify the symptoms of a COD. The narrative text of the VA report is used as a case study to examine the difficulties of applying NLP techniques to the health care domain.Motivated by multi-step learning where a final learning task is realized via a sequence of intermediate learning tasks, this paper presents a framework to leverage knowledge across the two domain adaptation settings of feature extraction and fine-tuning to improve VA narrative representations for COD classification task in the health domain. We use the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pretrained on the general English and health domains to extract features from the VA narratives. Our results demonstrate increased performance when the two embeddings from the English and health domains—learned through the domain adaptation tasks of feature extraction and fine-tuning are combined. The advantage of using character embeddings in conjunction with word embeddings is also apparent.

show abstract

Improving Cause-of-Death Classification from Verbal Autopsy Reports

Manaka¹,

Zyl²,

Kar³

2022

Preprint

View full text Add to dashboard Cite

In many lower-and-middle income countries including South Africa, data access in health facilities is restricted due to patient privacy and confidentiality policies. Further, since clinical data is unique to individual institutions and laboratories, there are insufficient data annotation standards and conventions. As a result of the scarcity of textual data, natural language processing (NLP) techniques have fared poorly in the health sector. A cause of death (COD) is often determined by a verbal autopsy (VA) report in places without reliable death registration systems. A non-clinician field worker does a VA report using a set of standardized questions as a guide to uncover symptoms of a COD. This analysis focuses on the textual part of the VA report as a case study to address the challenge of adapting NLP techniques in the health domain. We present a system that relies on two transfer learning paradigms of monolingual learning and multi-source domain adaptation to improve VA narratives for the target task of the COD classification. We use the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pre-trained on the general English and health domains to extract features from the VA narratives. Our findings suggest that this transfer learning system improves the COD classification tasks and that the narrative text contains valuable information for figuring out a COD. Our results further show that combining binary VA features and narrative text features learned via this framework boosts the classification task of COD.

show abstract

Using Machine Learning to Fuse Verbal Autopsy Narratives and Binary Features in the Analysis of Deaths from Hyperglycaemia

Manaka¹,

Zyl²,

Wade³

et al. 2022

Preprint

View full text Add to dashboard Cite

Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy (VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an openended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.