Key Points
Question
Can machine learning be used to predict incident delirium in newly hospitalized patients using only data available in the electronic health record shortly after admission?
Findings
In this cohort study, classification models were trained using 5 different machine learning algorithms on 14 227 hospital stays and validated on a prospective test set of 3996 hospital stays. The gradient boosting machine model performed best, with an area under the receiver operating characteristic curve of 0.855.
Meaning
Machine learning can accurately predict delirium risk using electronic health record data on admission and outperforms the nurse-administered prediction rules currently used.
Background: Chromatin organization is central to precise control of gene expression. In various eukaryotic species, domains of pervasive cis-chromatin interactions demarcate functional domains of the genomes. In nematode Caenorhabditis elegans, however, pervasive chromatin contact domains are limited to the dosage-compensated sex chromosome, leaving the principle of C. elegans chromatin organization unclear. Transcription factor III C (TFIIIC) is a basal transcription factor complex for RNA polymerase III, and is implicated in chromatin organization. TFIIIC binding without RNA polymerase III co-occupancy, referred to as extra-TFIIIC binding, has been implicated in insulating active and inactive chromatin domains in yeasts, flies, and mammalian cells. Whether extra-TFIIIC sites are present and contribute to chromatin organization in C. elegans remains unknown. Results: We identified 504 TFIIIC-bound sites absent of RNA polymerase III and TATA-binding protein co-occupancy characteristic of extra-TFIIIC sites in C. elegans embryos. Extra-TFIIIC sites constituted half of all identified TFIIIC binding sites in the genome. Extra-TFIIIC sites formed dense clusters in cis. The clusters of extra-TFIIIC sites were highly over-represented within the distal arm domains of the autosomes that presented a high level of heterochromatinassociated histone H3K9 trimethylation (H3K9me3). Furthermore, extra-TFIIIC clusters were embedded in the laminaassociated domains. Despite the heterochromatin environment of extra-TFIIIC sites, the individual clusters of extra-TFIIIC sites were devoid of and resided near the individual H3K9me3-marked regions. Conclusion: Clusters of extra-TFIIIC sites were pervasive in the arm domains of C. elegans autosomes, near the outer boundaries of H3K9me3-marked regions. Given the reported activity of extra-TFIIIC sites in heterochromatin insulation in yeasts, our observation raised the possibility that TFIIIC may also demarcate heterochromatin in C. elegans.
Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.