The aim of this study was to systematically review the performance of deep learning technology in detecting and classifying pulmonary nodules on computed tomography (CT) scans that were not from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) database. Furthermore, we explored the difference in performance when the deep learning technology was applied to test datasets different from the training datasets. Only peer-reviewed, original research articles utilizing deep learning technology were included in this study, and only results from testing on datasets other than the LIDC-IDRI were included. We searched a total of six databases: EMBASE, PubMed, Cochrane Library, the Institute of Electrical and Electronics Engineers, Inc. (IEEE), Scopus, and Web of Science. This resulted in 1782 studies after duplicates were removed, and a total of 26 studies were included in this systematic review. Three studies explored the performance of pulmonary nodule detection only, 16 studies explored the performance of pulmonary nodule classification only, and 7 studies had reports of both pulmonary nodule detection and classification. Three different deep learning architectures were mentioned amongst the included studies: convolutional neural network (CNN), massive training artificial neural network (MTANN), and deep stacked denoising autoencoder extreme learning machine (SDAE-ELM). The studies reached a classification accuracy between 68–99.6% and a detection accuracy between 80.6–94%. Performance of deep learning technology in studies using different test and training datasets was comparable to studies using same type of test and training datasets. In conclusion, deep learning was able to achieve high levels of accuracy, sensitivity, and/or specificity in detecting and/or classifying nodules when applied to pulmonary CT scans not from the LIDC-IDRI database.