Hand bone age, as the biological age of humans, can accurately reflect the development level and maturity of individuals. Bone age assessment results of adolescents can provide a theoretical basis for their growth and development and height prediction. In this study, a deep convolutional neural network (CNN) model based on fine-grained image classification is proposed, using a hand bone image dataset provided by the Radiological Society of North America (RSNA) as the research object. This model can automatically locate informative regions and extract local features in the process of hand bone image recognition, and then, the extracted local features are combined with global features of a complete image for bone age classification. This method can achieve end-to-end bone age assessment without any image annotation information (except bone age tags), improving the speed and accuracy of bone age assessment. Experimental results show that the proposed method achieves 66.38% and 68.63% recognition accuracy of males and females on the RSNA dataset, and the mean absolute errors are 3.71 ± 7.55 and 3.81 ± 7.74 months for males and females, respectively. The test time for each image is approximately 35 ms. This method achieves good performance and outperforms existing methods in bone age assessment based on weakly supervised fine-grained image classification.INDEX TERMS Bone age assessment, Deep learning, Convolutional neural network, Fine-grained image.