Passion fruit, renowned for its significant nutritional, medicinal, and economic value, is extensively cultivated in subtropical regions such as China, India, and Vietnam. In the production and processing industry, the quality grading of passion fruit plays a crucial role in the supply chain. However, the current process relies heavily on manual labor, resulting in inefficiency and high costs, which reflects the importance of expanding the application of fruit appearance quality classification mechanisms based on computer vision. Moreover, the existing passion fruit detection algorithms mainly focus on real-time detection and overlook the quality-classification aspect. This paper proposes the ATC-YOLOv5 model based on deep learning for passion fruit detection and quality classification. First, an improved Asymptotic Feature Pyramid Network (APFN) is utilized as the feature-extraction network, which is the network modified in this study by adding weighted feature concat pathways. This optimization enhances the feature flow between different levels and nodes, allowing for the adaptive and asymptotic fusion of richer feature information related to passion fruit quality. Secondly, the Transformer Cross Stage Partial (TRCSP) layer is constructed based on the introduction of the Multi-Head Self-Attention (MHSA) layer in the Cross Stage Partial (CSP) layer, enabling the network to achieve a better performance in modeling long-range dependencies. In addition, the Coordinate Attention (CA) mechanism is introduced to enhance the network’s learning capacity for both local and non-local information, as well as the fine-grained features of passion fruit. Moreover, to validate the performance of the proposed model, a self-made passion fruit dataset is constructed to classify passion fruit into four quality grades. The original YOLOv5 serves as the baseline model. According to the experimental results, the mean average precision (mAP) of ATC-YOLOv5 reaches 95.36%, and the mean detection time (mDT) is 3.2 ms, which improves the mAP by 4.83% and the detection speed by 11.1%, and the number of parameters is reduced by 10.54% compared to the baseline, maintaining the lightweight characteristics while improving the accuracy. These experimental results validate the high detection efficiency of the proposed model for fruit quality classification, contributing to the realization of intelligent agriculture and fruit industries.