Objective We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available research on Information Extraction from German medical texts. Materials and Methods BRONCO consists of 200 manually deidentified discharge summaries of cancer patients. Annotation followed a structured and quality-controlled process involving 2 groups of medical experts to ensure consistency, comprehensiveness, and high quality of annotations. We present results of several state-of-the-art techniques for different IE tasks as baselines for subsequent research. Results The annotated corpus consists of 11 434 sentences and 89 942 tokens, annotated with 11 124 annotations for medical entities and 3118 annotations of related attributes. We publish 75% of the corpus as a set of shuffled sentences, and keep 25% as held-out data set for unbiased evaluation of future IE tools. On this held-out dataset, our baselines reach depending on the specific entity types F1-scores of 0.72–0.90 for named entity recognition, 0.10–0.68 for entity normalization, 0.55 for negation detection, and 0.33 for speculation detection. Discussion Medical corpus annotation is a complex and time-consuming task. This makes sharing of such resources even more important. Conclusion To our knowledge, BRONCO is the first sizable and freely available German medical corpus. Our baseline results show that more research efforts are necessary to lift the quality of information extraction in German medical texts to the level already possible for English.
Background Structured and harmonized implementation of molecular tumor boards (MTB) for the clinical interpretation of molecular data presents a current challenge for precision oncology. Heterogeneity in the interpretation of molecular data was shown for patients even with a limited number of molecular alterations. Integration of high-dimensional molecular data, including RNA- (RNA-Seq) and whole-exome sequencing (WES), is expected to further complicate clinical application. To analyze challenges for MTB harmonization based on complex molecular datasets, we retrospectively compared clinical interpretation of WES and RNA-Seq data by two independent molecular tumor boards. Methods High-dimensional molecular cancer profiling including WES and RNA-Seq was performed for patients with advanced solid tumors, no available standard therapy, ECOG performance status of 0–1, and available fresh-frozen tissue within the DKTK-MASTER Program from 2016 to 2018. Identical molecular profiling data of 40 patients were independently discussed by two molecular tumor boards (MTB) after prior annotation by specialized physicians, following independent, but similar workflows. Identified biomarkers and resulting treatment options were compared between the MTBs and patients were followed up clinically. Results A median of 309 molecular aberrations from WES and RNA-Seq (n = 38) and 82 molecular aberrations from WES only (n = 3) were considered for clinical interpretation for 40 patients (one patient sequenced twice). A median of 3 and 2 targeted treatment options were identified per patient, respectively. Most treatment options were identified for receptor tyrosine kinase, PARP, and mTOR inhibitors, as well as immunotherapy. The mean overlap coefficient between both MTB was 66%. Highest agreement rates were observed with the interpretation of single nucleotide variants, clinical evidence levels 1 and 2, and monotherapy whereas the interpretation of gene expression changes, preclinical evidence levels 3 and 4, and combination therapy yielded lower agreement rates. Patients receiving treatment following concordant MTB recommendations had significantly longer overall survival than patients receiving treatment following discrepant recommendations or physician’s choice. Conclusions Reproducible clinical interpretation of high-dimensional molecular data is feasible and agreement rates are encouraging, when compared to previous reports. The interpretation of molecular aberrations beyond single nucleotide variants and preclinically validated biomarkers as well as combination therapies were identified as additional difficulties for ongoing harmonization efforts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.