Cell lines are key tools for preclinical cancer research, but it remains unclear how well they represent patient tumor samples. Identifying cell line models that best represent the features of particular tumor samples, as well as tumor types that lack in vitro model representation, remain important challenges. Gene expression has been shown to provide rich information that can be used to identify tumor subtypes, as well as predict the genetic dependencies and chemical vulnerabilities of cell lines. However, direct comparisons of tumor and cell line transcriptional profiles are complicated by systematic differences, such as the presence of immune and stromal cells in tumor samples and differences in the cancer-type composition of cell line and tumor expression datasets. To address these challenges, we developed an unsupervised alignment method (Celligner) and applied it to integrate several large-scale cell line and tumor RNA-Seq datasets. While our method aligns the majority of cell lines with tumor samples of the same cancer type, it also reveals large differences in tumor/cell line similarity across disease types. Furthermore, Celligner identifies a distinct group of several hundred cell lines from diverse lineages that present a more mesenchymal and undifferentiated transcriptional state and which exhibit distinct chemical and genetic dependencies. This method could thus be used to guide the selection of cell lines that more closely resemble patient tumors and improve the clinical translation of insights gained from cell line models.