Precocious puberty in girls is defined as the onset of pubertal changes before 8 years of age, and gonadotropin-releasing hormone (GnRH) agonist treatment is available for central precocious puberty (CPP). The gold standard for diagnosing CPP is the GnRH stimulation test. However, the GnRH stimulation test is time-consuming, costly, and requires repeated blood sampling. We aimed to develop an artificial intelligence (AI) prediction model to assist pediatric endocrinologists in decision making regarding the optimal timing to perform the GnRH stimulation test. We reviewed the medical charts of 161 girls who received the GnRH stimulation test from 1 August 2010 to 31 August 2021, and we selected 15 clinically relevant features for machine learning modeling. We chose the models with the highest area under the receiver operating characteristic curve (AUC) to integrate into our computerized physician order entry (CPOE) system. The AUC values for the CPP diagnosis prediction model (LH ≥ 5 IU/L) were 0.884 with logistic regression, 0.912 with random forest, 0.942 with LightGBM, and 0.942 with XGBoost. For the Taiwan National Health Insurance treatment coverage prediction model (LH ≥ 10 IU/L), the AUC values were 0.909, 0.941, 0.934, and 0.881, respectively. In conclusion, our AI predictive system can assist pediatric endocrinologists when they are deciding whether a girl with suspected CPP should receive a GnRH stimulation test. With proper use, this prediction model may possibly avoid unnecessary invasive blood sampling for GnRH stimulation tests.
Background Negation and speculation unrelated to abnormal findings can lead to false-positive alarms for automatic radiology report highlighting or flagging by laboratory information systems. Objective This internal validation study evaluated the performance of natural language processing methods (NegEx, NegBio, NegBERT, and transformers). Methods We annotated all negative and speculative statements unrelated to abnormal findings in reports. In experiment 1, we fine-tuned several transformer models (ALBERT [A Lite Bidirectional Encoder Representations from Transformers], BERT [Bidirectional Encoder Representations from Transformers], DeBERTa [Decoding-Enhanced BERT With Disentangled Attention], DistilBERT [Distilled version of BERT], ELECTRA [Efficiently Learning an Encoder That Classifies Token Replacements Accurately], ERNIE [Enhanced Representation through Knowledge Integration], RoBERTa [Robustly Optimized BERT Pretraining Approach], SpanBERT, and XLNet) and compared their performance using precision, recall, accuracy, and F1-scores. In experiment 2, we compared the best model from experiment 1 with 3 established negation and speculation-detection algorithms (NegEx, NegBio, and NegBERT). Results Our study collected 6000 radiology reports from 3 branches of the Chi Mei Hospital, covering multiple imaging modalities and body parts. A total of 15.01% (105,755/704,512) of words and 39.45% (4529/11,480) of important diagnostic keywords occurred in negative or speculative statements unrelated to abnormal findings. In experiment 1, all models achieved an accuracy of >0.98 and F1-score of >0.90 on the test data set. ALBERT exhibited the best performance (accuracy=0.991; F1-score=0.958). In experiment 2, ALBERT outperformed the optimized NegEx, NegBio, and NegBERT methods in terms of overall performance (accuracy=0.996; F1-score=0.991), in the prediction of whether diagnostic keywords occur in speculative statements unrelated to abnormal findings, and in the improvement of the performance of keyword extraction (accuracy=0.996; F1-score=0.997). Conclusions The ALBERT deep learning method showed the best performance. Our results represent a significant advancement in the clinical applications of computer-aided notification systems.
BACKGROUND Negation and the speculation unrelated to abnormal findings can lead to false positive alarms for automatic radiology report highlighting or flagging by laboratory information systems. OBJECTIVE This internal validation study evaluates the performance of NLP methods (NegEx, NegBio, NegBERT, and Transformers). METHODS We annotated all negative and the speculative statements unrelated to abnormal findings in reports. In Experiment 1, we fine-tuned several Transformer models (ALBERT, BERT, DeBERTa, DistilBERT, ELECTRA, ERNIE, RoBERTa, SpanBERT, XLNet) and compared their performance using precision, recall, accuracy, and F1 scores. In Experiment 2, we compared the best model from Experiment 1 with three established negation and speculation detection algorithms (NegEx, NegBio, NegBERT). RESULTS Our study collected 6000 radiology reports from three branches of Chi Mei Hospital, covering multiple imaging modalities and body parts. 15.0% of words and 39.5% of important diagnostic keywords occurred in negative statements or speculative statements unrelated to abnormal findings. In experiment 1, all models achieved accuracy > 98% and F1 score > 90% on the test dataset. ALBERT showed the best performance (accuracy 99.1%, F1 score 95.8%). In experiment 2, ALBERT outperformed the optimized NegEx, NegBio, and NegBERT methods overall (accuracy 99.6%, F1 score 99.1%) and in the prediction of whether diagnostic keywords occur in speculative statements unrelated to abnormal findings. CONCLUSIONS The ALBERT deep learning method showed the best performance. Our result represents a significant advance in the clinical application of computer-aided notification systems. CLINICALTRIAL Not applicable
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.