The financial technology service industry involves a large number of image and text information processing tasks. By automatically processing images and text information, financial institutions can greatly reduce labor costs, improve overall operational efficiency, and help financial institutions identify and predict risks more accurately, thereby improving risk management capabilities. The existing image symbol recognition and scene text detection methods may be affected in terms of recognition accuracy when processing complex scenes, low-resolution images or texts affected by obstacles, distortions and other factors. To this end, this study conducts an in-depth study on the application of deep learning-based intelligent image recognition in financial technology services. It elaborates the application scenarios of image symbol recognition and scene text detection in financial technology services. The ASTER model is improved, and the combination of attention mechanism sequential decoding can effectively capture local information and global dependencies in the feature sequence, thereby improving the recognition accuracy of the image symbol recognition model. By focusing on the center point position information of the text, pixels with the same center point are aggregated to reduce the interference between adjacent texts to some extent, achieving more accurate text segmentation. Experimental results validate the effectiveness of the method in this study.