The online automated maturity grading and counting of tomato fruits has a certain promoting effect on digital supervision of fruit growth status and unmanned precision operations during the planting process. The traditional grading and counting of tomato fruit maturity is mostly done manually, which is time-consuming and laborious work, and its precision depends on the accuracy of human eye observation. The combination of artificial intelligence and machine vision has to some extent solved this problem. In this work, firstly, a digital camera is used to obtain tomato fruit image datasets, taking into account factors such as occlusion and external light interference. Secondly, based on the tomato maturity grading task requirements, the MHSA attention mechanism is adopted to improve YOLOv8’s backbone to enhance the network’s ability to extract diverse features. The Precision, Recall, F1-score, and mAP50 of the tomato fruit maturity grading model constructed based on MHSA-YOLOv8 were 0.806, 0.807, 0.806, and 0.864, respectively, which improved the performance of the model with a slight increase in model size. Finally, thanks to the excellent performance of MHSA-YOLOv8, the Precision, Recall, F1-score, and mAP50 of the constructed counting models were 0.990, 0.960, 0.975, and 0.916, respectively. The tomato maturity grading and counting model constructed in this study is not only suitable for online detection but also for offline detection, which greatly helps to improve the harvesting and grading efficiency of tomato growers. The main innovations of this study are summarized as follows: (1) a tomato maturity grading and counting dataset collected from actual production scenarios was constructed; (2) considering the complexity of the environment, this study proposes a new object detection method, MHSA-YOLOv8, and constructs tomato maturity grading models and counting models, respectively; (3) the models constructed in this study are not only suitable for online grading and counting but also for offline grading and counting.