In order to efficiently and accurately monitor blood glucose concentration (BGC) synthetically influenced by various factors, quantitative blood glucose in vitro detection was studied using photoacoustic temporal spectroscopy (PTS) combined with a fusion deep neural network (fDNN). Meanwhile, a photoacoustic detection system influenced by five factors was set up, and 625 time-resolved photoacoustic signals of rabbit blood were collected under different influencing factors.In view of the sequence property for temporal signals, a dimension convolutional neural network (1DCNN) was established to extract features containing BGC. Through the parameters optimization and adjusting, the mean square error (MSE) of BGC was 0.51001 mmol/L for 125 testing sets. Then, due to the long-term dependence on temporal signals, a long short-term memory (LSTM) module was connected to enhance the prediction accuracy of BGC. With the optimal LSTM layers, the MSE of BGC decreased to 0.32104 mmol/L. To further improve prediction accuracy, a self-attention mechanism (SAM) module was coupled into and formed an fDNN model, i.e., 1DCNN-SAM-LSTM. The fDNN model not only combines the advantages of temporal expansion of 1DCNN and data long-term memory of LSTM, but also focuses on the learning of more important features of BGC. Comparison results show that the fDNN model outperforms the other six models. The determination coefficient of BGC for the testing set was 0.990, and the MSE reached 0.1432 mmol/L. Results demonstrate that PTS combined with 1DCNN-SAM-LSTM ensures higher accuracy of BGC under the synthetical influence of various factors, as well as greatly enhances the detection efficiency.