Automatically identifying thrombotic phenotypes based on clinical data, particularly clinical texts, can be challenging. Although many investigators have developed targeted information extraction methods for identifying thrombotic phenotypes from radiology notes, these methods can be time consuming to train, require large amounts of training data, and may miss subtle textual clues predictive of a thrombotic phenotype from notes beyond the radiology note. We developed a generalizable, data-driven framework for learning, characterizing, and visualizing clinical concepts from both radiology and discharge summaries predictive of thrombotic phenotypes.