Background
Medical narratives are fundamental to the correct identification of a patient's health condition. This is not only because it describes the patient’s situation. It also contains relevant information about the patient’s context and health state evolution. Narratives are usually vague and cannot be categorized easily. On the other hand, once the patient’s situation is correctly identified based on a narrative, it is then possible to map the patient’s situation into precise classification schemas and ontologies that are machine-readable. To this end, language models can be trained to read and extract elements from these narratives. However, the main problem is the lack of data for model identification and model training in languages other than English. Alternative available data, like MIMIC (Johnson et al. 2016) is written in English and for specific patient conditions like intensive care. Thus, when model training is required for other types of patients, like oncology (and not intensive care), this could lead to bias. To facilitate clinical narrative models training, a method for creating high-quality synthetic narratives is needed.
Method
We devised workflows based on generative AI methods to synthesize narratives in the German Language. Since we required highly realistic narratives, we generated prompts, written with high-quality medical terminology, asking for clinical narratives containing both a main and co-disease. The frequency of distribution of both the main and co-disease was extracted from the hospital’s structured data, such that the synthetic narratives reflect the disease distribution among the patient’s cohort. In order to validate the quality of the synthetic narratives, we annotated them to train a Named Entity Recognition (NER) algorithm. According to our assumptions, the validation of this system implies that the synthesized data used for its training are of acceptable quality.
Result
We report precision, recall and F1 score for the NER model while also considering metrics that take into account both exact and partial entity matches. We obtained a precision of 0.851 for Entity Type match metric, with a F1 score of 0.188.
Conclusion
Despite its inherent limitations, this technology can accelerate model identification and training. By using this approach, data can be interoperable across languages and regions without compromising data safety.