BackgroundIdentifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method.MethodsWe obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set.ResultsWe calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905.ConclusionsOur results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is hypothesized to become more powerful with access to more data. Therefore, currently, the shallow neural networks and random forests are the desirable classifiers.
Learn2Mine is a cloud-based environment developed to support the teaching of data science. This paper discusses the architecture of Learn2Mine, the research that guided its development, and the pilot implementation and formative assessment of its use in teaching data science. Learn2Mine was pilot-tested in Fall 2013 in an introductory data science source. At the end of the term, a survey of students concerning their experiences with the environment was conducted. Quantitative analysis of survey data showed that student opinion about the usefulness of the tool for learning course content was positive. Through open-ended comments, students provided constructive feedback on how the system might be improved. To collect expert opinion on both the didactic and usability aspects of the Learn2Mine system, a number of experts were enlisted to try the system. Experts responded to a survey regarding criteria typically expected of instructional software, such as system usability and flexibility, as well as accuracy and organization of content. Overall, the responses from experts were extremely positive. A plan for further development of the system, based on these results, is presented along with information on the developers' plans for making the environment available for use at other institutions.
We introduce Learn2Mine, an education and analysis platform that integrates state-of-the-art data mining tools with effective feedback and training mechanisms in order to lower the barrier for domain experts and computer scientists to learn data science. Data science is the combination of statistical and computer science techniques in order to extract meaningful information from domain-specific datasets. Learn2Mine is a platform where students learn and practice techniques commonly used by data scientists. The Learn2Mine platform is a novel environment for teaching data science without requiring prerequisite knowledge, and with the idea that all knowledge bases can be enhanced by data science. It applies the principles of gamification, making the learning process more engaging and rewarding. Learn2Mine has been piloted by undergraduates, which, through the ability to retry lessons and receive instant feedback, has allowed them to engage in more sophisticated data science concepts than previous semesters. The next step for Learn2Mine, which will be continuously extended with new algorithms and lessons and completely open to the public beginning January 2014 (http://learn2mine.appspot.com), is the completion of an extension framework giving international institutions and organizations of higher learning the ability to create their own lessons for students to perform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.