A Systematic Approach to Configuring MetaMap for Optimal Performance

Jing, Xia; Indani, Akash; Hubig, Nina; Min, Hua; Gong, Yang; Cimino, James J.; Sittig, Dean F.; Rennert, Lior; Robinson, David; Biondich, Paul G.; Wright, Adam; Nøhr, Christian; Law, Timothy; Faxvaag, Arild; Gimbel, Ronald W.

doi:10.1055/a-1862-0421

Cited by 1 publication

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Now, only 3148 unlabeled articles remained, and we created synthetic KP and marked the labels to create a synthetic labeled dataset for the CDSS domain with a 1:2 train-validation split. Cohen’s kappa rates for the first 42 (GS42) abstracts were 0.93 (between annotators 1 and 2) and 0.73 (between annotators 1 and 3) [37]. For the second set of abstracts (GS91), Cohen’s kappa rates were 0.87 (between annotators 1 and 2) and 0.97 (between annotators 1 and 3).…”

Section: Experiments and Resultsmentioning

confidence: 99%

Keyphrase Identification Using Minimal Labeled Data with Hierarchical Contexts and Transfer Learning

Rohan

Hubig

Min

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Interoperable clinical decision support system (CDSS) rules are a pathway to achieving interoperability which is a well-recognized challenge in health information technology. Building an ontology facilitates the creation of interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. However, KP identification for labeling the data requires human expertise, consensus, and contextual understanding. This paper aims to present a semi-supervised framework for the CDSS using minimal labeled data based on hierarchical attention over the documents fused with domain adaptation approaches. Then, evaluate the effectiveness of KP identification with this framework. In the view of semi-supervised learning, our methodology toward building this framework outperforms the prior neural architectures by learning with document-level context, no explicit hand-crafted features, knowledge transfer from pre-trained models (on unlabeled corpus), and post-fine-tuning with smaller gold standard-labeled data. To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify the KP, which is trained on limited labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%