The excessive time required by fluorescence diffuse optical tomography (fDOT) image reconstruction based on path-history fluorescence Monte Carlo model is its primary limiting factor. Herein, we present a method that accelerates fDOT image reconstruction. We employ three-level parallel architecture including multiple nodes in cluster, multiple cores in central processing unit (CPU), and multiple streaming multiprocessors in graphics processing unit (GPU). Different GPU memories are selectively used, the data-writing time is effectively eliminated, and the data transport per iteration is minimized. Simulation experiments demonstrated that this method can utilize general-purpose computing platforms to efficiently implement and accelerate fDOT image reconstruction, thus providing a practical means of using path-history-based fluorescence Monte Carlo model for fDOT imaging.