In photoacoustic tomography (PAT), conventional image reconstruction methods are generally based on the assumption of an ideal point-like ultrasonic detector. This assumption is appropriate when the receiving surface of the detector is sufficiently small and/or the distance between the imaged object and the detector is large enough. However, it does not hold in endoscopic applications of PAT. In this study, we propose a model-based image reconstruction method for endoscopic photoacoustic tomography (EPAT), considering the effect of detector responses on image quality. We construct a forward model to physically describe the imaging process of EPAT, including the generation of the initial pressure due to optical absorption and thermoelastic expansion, the propagation of photoacoustic waves in tissues, and the acoustic measurement. The model outputs the theoretical sampling voltage signal, which is the response of the ultrasonic detector to the acoustic pressure reaching its receiving surface. The images representing the distribution map of the optical absorption energy density on cross-sections of the imaged luminal structures are reconstructed from the sampling voltage signals output by the detector through iterative inversion of the forward model. Compared with the conventional approaches based on back-projection and other imaging models, our method improved the quality and spatial resolution of the resulting images.