Objective Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases. Materials and Methods Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability. Results Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks. Conclusions We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.
Integration of detailed phenotype information with genetic data is well established to facilitate accurate diagnosis of hereditary disorders. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from heterogeneous EHR narratives remains a challenge. Here, we present EHR-Phenolyzer, a high-throughput EHR framework for extracting and analyzing phenotypes. EHR-Phenolyzer extracts and normalizes Human Phenotype Ontology (HPO) concepts from EHR narratives and then prioritizes genes with causal variants on the basis of the HPO-coded phenotype manifestations. We assessed EHR-Phenolyzer on 28 pediatric individuals with confirmed diagnoses of monogenic diseases and found that the genes with causal variants were ranked among the top 100 genes selected by EHR-Phenolyzer for 16/28 individuals (p < 2.2 × 10), supporting the value of phenotype-driven gene prioritization in diagnostic sequence interpretation. To assess the generalizability, we replicated this finding on an independent EHR dataset of ten individuals with a positive diagnosis from a different institution. We then assessed the broader utility by examining two additional EHR datasets, including 31 individuals who were suspected of having a Mendelian disease and underwent different types of genetic testing and 20 individuals with positive diagnoses of specific Mendelian etiologies of chronic kidney disease from exome sequencing. Finally, through several retrospective case studies, we demonstrated how combined analyses of genotype data and deep phenotype data from EHRs can expedite genetic diagnoses. In summary, EHR-Phenolyzer leverages EHR narratives to automate phenotype-driven analysis of clinical exomes or genomes, facilitating the broader implementation of genomic medicine.
Immune checkpoint inhibitors have greatly improved the prognoses of diverse advanced malignancies, including gastric and gastroesophageal junction (G/GEJ) cancer. However, the role of anti-programmed cell death protein-1 treatment in the neoadjuvant setting remains unclear. This phase 2 study aimed to evaluate sintilimab plus CapeOx as a neoadjuvant regimen in patients with advanced resectable G/GEJ adenocarcinoma. Eligible patients with resectable G/GEJ adenocarcinoma stage cT3-4NanyM0 were enrolled. Patients received neoadjuvant treatment with sintilimab (3 mg/kg for cases <60 kg or 200 mg for those ≥60 kg on day 1) plus CapeOx (oxaliplatin at 130 mg/m2 on D1 and capecitabine at 1000 mg/m2 two times per day on D1–D14) every 21 days, for three cycles before surgical resection, followed by adjuvant treatment with three cycles of CapeOx with the same dosages after surgical resection. The primary endpoint was pathological complete response (pCR) rate. Secondary endpoints included objective response rate, tumor regression grade per Becker criteria, survival and safety. As of July 30, 2020, 36 patients were enrolled. Totally 7 (19.4%) patients had GEJ cancer, and 34 (94.4%) patients were clinical stage III cases. A total of 35 (97.2%) patients completed three cycles of neoadjuvant treatment, and 1 patients received two cycles due to adverse events. All patients underwent surgery and the R0 resection rate was 97.2%. In this study, pCR and major pathological response were achieved in 7 (19.4%, 95% CI: 8.8% to 35.7%; 90% CI: 10.7% to 33.1%) and 17 (47.2%, 95% CI: 31.6% to 64.3%) patients, respectively. Thirty-one patients received adjuvant treatment. By December 20, 2021, three patients died after disease relapse, and two patients were alive with relapse. Median disease-free survival (DFS) and overall survival (OS) were not reached. The 1-year DFS and OS rates were 90.3% (95% CI: 80.4% to 100.0%) and 94.1% (95% CI: 86.5% to 100.0%), respectively. The most common (>1 patient) grade 3 treatment-related adverse events during neoadjuvant treatment were anemia and neutropenia (n=5 each, 13.9%). No serious adverse events (AEs) or grade 4–5 AEs were observed. Sintilimab plus oxaliplatin/capecitabine showed promising efficacy with encouraging pCR rate and good safety profile in the neoadjuvant setting. This combination regimen might present a new option for patients with locally advanced, resectable G/GEJ adenocarcinoma. Trial registration; NCT04065282.
We present Doc2Hpo, an interactive web application that enables interactive and efficient phenotype concept curation from clinical text with automated concept normalization using the Human Phenotype Ontology (HPO). Users can edit the HPO concepts automatically extracted by Doc2Hpo in real time, and export the extracted HPO concepts into gene prioritization tools. Our evaluation showed that Doc2Hpo significantly reduced manual effort while achieving high accuracy in HPO concept curation. Doc2Hpo is freely available at https://impact2.dbmi.columbia.edu/doc2hpo/. The source code is available at https://github.com/stormliucong/doc2hpo for local installation for protected health data.
Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three realworld datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. The source code of this paper can be obtained from https://github.com/ thunlp/Chinese_NRE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.