Tian Bai scite author profile

Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient’s diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline. The main novelty of Timeline is that it has a mechanism that learns time decay factors for every medical code. This allows the Timeline to learn that chronic conditions have a longer lasting impact on future visits than acute conditions. Timeline also has an attention mechanism that improves vector embeddings of visits. By analyzing the attention weights and disease progression functions of Timeline, it is possible to interpret the predictions and understand how risks of future visits change over time. We evaluated Timeline on two large-scale real world data sets. The specific task was to predict what is the primary diagnosis category for the next hospital visit given previous visits. Our results show that Timeline has higher accuracy than the state of the art deep learning models based on RNN. In addition, we demonstrate that time decay factors and attentions learned by Timeline are in accord with the medical knowledge and that Timeline can provide a useful insight into its predictions.

show abstract

Improving Medical Code Prediction from Clinical Text via Incorporating Online Knowledge Sources

Bai

Vučetić

2019

View full text Add to dashboard Cite

Medical Concept Representation Learning from Multi-source Data

Bai

Egleston

Bleicher

et al. 2019

View full text Add to dashboard Cite

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

show abstract

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

Bai

Chanda

Egleston

et al. 2018

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundThere has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients.MethodsIn this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code.ResultsIn our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit.ConclusionsThe jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.

show abstract

Peer assessment of CS doctoral programs shows strong correlation with faculty citations

et al. 2018

View full text Add to dashboard Cite

IMAGERY FROM SHUTTERSTOCK.COM counts of papers published in computer science journals and number of highly cited faculty). The final ranking is a weighted average of these measures. The scientometrics community criticized this approach because the choice of weights is not clearly justified. 4,6 The U.S. News ranking of doctoral programs in engineering b uses a weighted average of objective measures and subjective measures. As with the ARWU, justification for the ranking formula is lacking.Ranking of computer science doctoral programs published in 2010 by the U.S. National Research Council (NRC) 2 is notable for its effort to provide a justifiable ranking formula. The b https://www.usnews.com/education/bestgraduate-schools/articles/engineeringschools-methodology?int=9d0e08NRC collected objective measures and surveyed faculty to assess peer institutions on multiple measures of perceived quality. The NRC ranking group then built a regression model that predicts subjective measures based on the objective measures. The resulting regression model was used to provide ranking order. Unfortunately, the subjective and objective data collected during the NRC ranking project had questionable quality 3 and the resulting ranking did not find good reception in computer science community. c We find the NRC idea of calculating the ranking formula through regression modeling better justified than the alternatives. In this article, we address the data-quality issue that plagued the c http://www.chronicle.com/article/Too-Big-toFail/127212/ NRC ranking project by collecting unbiased objective data about programs in the form of faculty-citation indices and demonstrate that regression analysis is a viable approach for ranking computer science doctoral programs.We also obtain valuable insights into the relationship between peer assessments and objective measures. contributed articles ence departments. When people pages clearly separated such faculty from primary appointments, we included only the primary appointments in our list. When the people pages did not provide discriminable information about affiliations, we included all listed tenure-track faculty. The details of faculty selection for each university are in the "CS Department Data" file we maintain on our ranking webpage. f Overall, we collected the names of 4,728 tenure-track faculty members, including 1,114 assistant professors, 1,271 associate professors, and 2,343 full professors. Since assistant professors are typically only starting their academic careers and publication records, we treated them differently from associate and full professors, and for the rest of this article, we refer to associate and full professors as "senior faculty." Ranking DataThe distribution of program size is quite varied, with median faculty size of 22 positions, mode of 15, minimum of four, and maximum of 143 (CMU). The Pearson correlation between department size and USN CS score of the 119 programs ranked by U.S. News is 0.676, indicating larger departments are more likely to be higher ra...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tian Bai

Interpretable Representation Learning for Healthcare via Capturing Disease Progression through Time

Improving Medical Code Prediction from Clinical Text via Incorporating Online Knowledge Sources

Medical Concept Representation Learning from Multi-source Data

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

Peer assessment of CS doctoral programs shows strong correlation with faculty citations

Contact Info

Product

Resources

About