Purpose: Laryngoscopy, the most common diagnostic method for vocal cord lesions (VCLs), is based mainly on the visual subjective inspection of otolaryngologists. This study aimed to establish a highly objective computer-aided VCLs diagnosis system based on deep convolutional neural network (DCNN) and transfer learning. Methods: To classify VCLs, our method combined the DCNN backbone with transfer learning on a system specifically finetuned for a laryngoscopy image dataset. Laryngoscopy image database was collected to train the proposed system. The diagnostic performance was compared with other DCNN-based models. Analysis of F1 score and receiver operating characteristic curves were conducted to evaluate the performance of the system. Results: Beyond the existing VCLs diagnosis method, the proposed system achieved an overall accuracy of 80.23%, an F1 score of 0.7836, and an area under the curve (AUC) of 0.9557 for four fine-grained classes of VCLs, namely, normal, polyp, keratinization, and carcinoma. It also demonstrated robust classification capacity for detecting urgent (keratinization, carcinoma) and non-urgent (normal, polyp), with an overall accuracy of 0.939, a sensitivity of 0.887, a specificity of 0.993, and an AUC of 0.9828. The proposed method also outperformed clinicians in the classification of normal, polyps, and carcinoma at an extremely low time cost. Conclusion: The VCLs diagnosis system succeeded in using DCNN to distinguish the most common VCLs and normal cases, holding a practical potential for improving the overall diagnostic efficacy in VCLs examinations. The proposed VCLs diagnosis system could be appropriately integrated into the conventional workflow of VCLs laryngoscopy as a highly objective auxiliary method.