The accuracy of galaxy photometric redshift (photo-z) can significantly affect the analysis of weak gravitational lensing measurements, especially for future high-precision surveys. In this work, we try to extract photo-z information from both galaxy flux and image data expected to be obtained by China Space Station Telescope (CSST) using neural networks. We generate mock galaxy images based on the observational images from the Advanced Camera for Surveys of Hubble Space Telescope (HST-ACS) and COSMOS catalogs, considering the CSST instrumental effects. Galaxy flux data are then measured directly from these images by aperture photometry. The Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN) are constructed to predict photo-z from fluxes and images, respectively. We also propose to use an efficient hybrid network, which combines the MLP and CNN, by employing the transfer learning techniques to investigate the improvement of the result with both flux and image data included. We find that the photo-z accuracy and outlier fraction can achieve σNMAD = 0.023 and $\eta = 1.43\%$ for the MLP using flux data only, and σNMAD = 0.025 and $\eta = 1.21\%$ for the CNN using image data only. The result can be further improved in high efficiency as σNMAD = 0.020 and $\eta = 0.90\%$ for the hybrid transfer network. These approaches result in similar galaxy median and mean redshifts 0.8 and 0.9, respectively, for the redshift range from 0 to 4. This indicates that our networks can effectively and properly extract photo-z information from the CSST galaxy flux and image data.