Background: To develop a deep learning model to classify primary bone tumors from preoperative radiographs and compare performance with radiologists. Methods: A total of 1356 patients (2899 images) with histologically confirmed primary bone tumors and preoperative radiographs were identified from five institutions' pathology databases. Manual cropping was performed by radiologists to label the lesions. Binary discriminatory capacity (benign versus not-benign and malignant versus not-malignant) and three-way classification (benign versus intermediate versus malignant) performance of our model were evaluated. The generalizability of our model was investigated on data from external test set. Final model performance was compared with interpretation from five radiologists of varying level of experience using the Permutations tests. Findings: For benign vs. not benign, model achieved area under curve (AUC) of 0894 and 0877 on cross-validation and external testing, respectively. For malignant vs. not malignant, model achieved AUC of 0907 and 0916 on cross-validation and external testing, respectively. For three-way classification, model achieved 721% accuracy vs. 746% and 721% for the two subspecialists on cross-validation (p = 003 and p = 052, respectively). On external testing, model achieved 734% accuracy vs. 693%, 734%, 731%, 679%, and 634% for the two subspecialists and three junior radiologists (p = 014, p = 089, p = 093, p = 002, p < 001 for radiologists 1À5, respectively). Interpretation: Deep learning can classify primary bone tumors using conventional radiographs in a multiinstitutional dataset with similar accuracy compared to subspecialists, and better performance than junior radiologists.