Background: Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. Objective: We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. Methods: We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. Results: The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. Conclusions: We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users’ objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.
Introduction Diagnosing renal pathologies is important for performing treatments. However, classifying every glomerulus is difficult for clinicians; thus, a support system, such as a computer, is required. This paper describes the automatic classification of glomerular images using a convolutional neural network (CNN). Method To generate appropriate labeled data, annotation criteria including 12 features (e.g., “fibrous crescent”) were defined. The concordance among 5 clinicians was evaluated for 100 images using the kappa (κ) coefficient for each feature. Using the annotation criteria, 1 clinician annotated 10,102 images. We trained the CNNs to classify the features with an average κ ≥0.4 and evaluated their performance using the receiver operating characteristic–area under the curve (ROC–AUC). An error analysis was conducted and the gradient-weighted class activation mapping (Grad-CAM) was also applied; it expresses the CNN’s focusing point with a heat map when the CNN classifies the glomerular image for a feature. Results The average κ coefficient of the features ranged from 0.28 to 0.50. The ROC–AUC of the CNNs for test data varied from 0.65 to 0.98. Among the features, “capillary collapse” and “fibrous crescent” had high ROC–AUC values of 0.98 and 0.91, respectively. The error analysis and the Grad-CAM visually showed that the CNN could not distinguish between 2 different features that had similar visual structures or that occurred simultaneously. Conclusion The differences in the texture or frequency of the co-occurrence between the different features affected the CNN performance; thus, to improve the classification accuracy, methods such as segmentation are required.
Generalized language models that are pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate the development of a clinical specific BERT model with a huge amount of Japanese clinical text and evaluate it on the NTCIR-13 MedWeb that has fake Twitter messages regarding medical concerns with eight labels. Approximately 120 million clinical texts stored at the University of Tokyo Hospital were used as our dataset. The BERT-base was pre-trained using the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked-LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT did not show significantly higher performance on the MedWeb task than the other BERT models that were pre-trained with Japanese Wikipedia text. The advantage of pre-training on clinical text may become apparent in more complex tasks on actual clinical text, and such an evaluation set needs to be developed.
The histopathological findings of the glomeruli from whole slide images (WSIs) of a renal biopsy play an important role in diagnosing and grading kidney disease. This study aimed to develop an automated computational pipeline to detect glomeruli and to segment the histopathological regions inside of the glomerulus in a WSI. In order to assess the significance of this pipeline, we conducted a multivariate regression analysis to determine whether the quantified regions were associated with the prognosis of kidney function in 46 cases of immunoglobulin A nephropathy (IgAN). The developed pipelines showed a mean intersection over union (IoU) of 0.670 and 0.693 for five classes (i.e., background, Bowman’s space, glomerular tuft, crescentic, and sclerotic regions) against the WSI of its facility, and 0.678 and 0.609 against the WSI of the external facility. The multivariate analysis revealed that the predicted sclerotic regions, even those that were predicted by the external model, had a significant negative impact on the slope of the estimated glomerular filtration rate after biopsy. This is the first study to demonstrate that the quantified sclerotic regions that are predicted by an automated computational pipeline for the segmentation of the histopathological glomerular components on WSIs impact the prognosis of kidney function in patients with IgAN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.