Objectives To provide insights for on-site development of transformer-based structuring of free-text report databases by investigating different labeling and pre-training strategies. Methods A total of 93,368 German chest X-ray reports from 20,912 intensive care unit (ICU) patients were included. Two labeling strategies were investigated to tag six findings of the attending radiologist. First, a system based on human-defined rules was applied for annotation of all reports (termed “silver labels”). Second, 18,000 reports were manually annotated in 197 h (termed “gold labels”) of which 10% were used for testing. An on-site pre-trained model (Tmlm) using masked-language modeling (MLM) was compared to a public, medically pre-trained model (Tmed). Both models were fine-tuned on silver labels only, gold labels only, and first with silver and then gold labels (hybrid training) for text classification, using varying numbers (N: 500, 1000, 2000, 3500, 7000, 14,580) of gold labels. Macro-averaged F1-scores (MAF1) in percent were calculated with 95% confidence intervals (CI). Results Tmlm,gold (95.5 [94.5–96.3]) showed significantly higher MAF1 than Tmed,silver (75.0 [73.4–76.5]) and Tmlm,silver (75.2 [73.6–76.7]), but not significantly higher MAF1 than Tmed,gold (94.7 [93.6–95.6]), Tmed,hybrid (94.9 [93.9–95.8]), and Tmlm,hybrid (95.2 [94.3–96.0]). When using 7000 or less gold-labeled reports, Tmlm,gold (N: 7000, 94.7 [93.5–95.7]) showed significantly higher MAF1 than Tmed,gold (N: 7000, 91.5 [90.0–92.8]). With at least 2000 gold-labeled reports, utilizing silver labels did not lead to significant improvement of Tmlm,hybrid (N: 2000, 91.8 [90.4–93.2]) over Tmlm,gold (N: 2000, 91.4 [89.9–92.8]). Conclusions Custom pre-training of transformers and fine-tuning on manual annotations promises to be an efficient strategy to unlock report databases for data-driven medicine. Key Points • On-site development of natural language processing methods that retrospectively unlock free-text databases of radiology clinics for data-driven medicine is of great interest. • For clinics seeking to develop methods on-site for retrospective structuring of a report database of a certain department, it remains unclear which of previously proposed strategies for labeling reports and pre-training models is the most appropriate in context of, e.g., available annotator time. • Using a custom pre-trained transformer model, along with a little annotation effort, promises to be an efficient way to retrospectively structure radiological databases, even if not millions of reports are available for pre-training.
Background The Preoperative Score to Predict Postoperative Mortality (POSPOM) based on preoperatively available data was presented by Le Manach et al. in 2016. This prognostic model considers the kind of surgical procedure, patients' age and 15 defined comorbidities to predict the risk of postoperative in-hospital mortality. Objective of the present study was to validate POSPOM for the German healthcare coding system (G-POSPOM). Methods and findings All cases involving anaesthesia performed at the University Hospital Bonn between 2006 and 2017 were analysed retrospectively. Procedures codified according to the French Groupes Homogènes de Malades (GHM) were translated and adapted to the German Operationen- und Prozedurenschlüssel (OPS). Comorbidities were identified by the documented International Statistical Classification of Diseases (ICD-10) coding. POSPOM was calculated for the analysed patient collective using these data according to the method described by Le Manach et al. Performance of thereby adapted POSPOM was tested using c-statistic, Brier score and a calibration plot. Validation was performed using data from 199,780 surgical cases. With a mean age of 56.33 years (SD 18.59) and a proportion of 49.24% females, the overall cohort had a mean POSPOM value of 18.18 (SD 8.11). There were 4,066 in-hospital deaths, corresponding to an in-hospital mortality rate of 2.04% (95% CI 1.97 to 2.09%) in our sample. POSPOM showed a good performance with a c-statistic of 0.771 and a Brier score of 0.021. Conclusions After adapting POSPOM to the German coding system, we were able to validate the score using patient data of a German university hospital. According to previous demonstration for French patient cohorts, we observed a good correlation of POSPOM with in-hospital mortality. Therefore, further adjustments of POSPOM considering also multicentre and transnational validation should be pursued based on this proof of concept.
Background The Preoperative Score to Predict Postoperative Mortality (POSPOM) assesses the patients’ individual risk for postsurgical intrahospital death based on preoperative parameters. We hypothesized that mortality predicted by the POSPOM varies depending on the level of postoperative care. Methods All patients age over 18 years undergoing inpatient surgery or interventions involving anesthesia at a German university hospital between January 2006, and December 2017, were assessed for eligibility for this retrospective study. Endpoint was death in hospital following surgery. Adaptation of the POSPOM to the German coding system was performed as previously described. The whole cohort was divided according to the level of postoperative care (normal ward vs. intensive care unit (ICU) admission within 24 h vs. later than 24 h, respectively). Results 199,258 patients were finally included. Observed intrahospital mortality was 2.0% (4,053 deaths). 9.6% of patients were transferred to ICU following surgery, and mortality of those patients was increased already at low POSPOM values of 15. 17,165 patients were admitted to ICU within 24 h, and these patients were older, had more comorbidities, or underwent more invasive surgery, reflected by a higher median POSPOM score compared to the normal-ward group (29 vs. 17, p <0.001). Mortality in that cohort was significantly increased to 8.7% (p <0.001). 2,043 patients were admitted to ICU later than 24 h following surgery (therefore denoted unscheduled admission), and the median POSPOM value of that group was 23. Observed mortality in this cohort was highest (13.5%, p <0.001 vs. ICU admission <24 h cohort). Conclusion Increased mortality in patients transferred to high-care wards reflects the significance of, e.g., intra- or early postoperative events for the patients’ outcome. Therefore, scoring systems considering only preoperative variables such as the POSPOM reveal limitations to predict the individual benefit of postoperative ICU admission.
Radiologists commonly conduct chest X-rays for the diagnosis of pathologies or the evaluation of extrathoracic material positions in intensive care unit (ICU) patients. Automated assessments of radiographs have the potential to assist physicians by detecting pathologies that pose an emergency, leading to faster initiation of treatment and optimization of clinical workflows. The amount and quality of training data is a key aspect for developing deep learning models with reliable performance. This work investigates the effects of transfer learning on public data, automatically generated data labels and manual data annotation on the classification of ICU chest X-rays of the University Hospital Bonn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.