For millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, hybrid processing architecture is essential to significantly reduce the complexity and cost but is quite challenging to be jointly optimized over the transmitter and receiver. In this paper, deep learning (DL) is applied to design a novel joint hybrid processing framework (JHPF) that allows end-to-end optimization by using back propagation. The proposed framework includes three parts: hybrid processing designer, signal flow simulator, and signal demodulator, which outputs the hybrid processing matrices for the transceiver by using neural networks (NNs), simulates the signal transmission over the air, and maps the detected symbols to the original bits by using the NN, respectively. By minimizing the cross-entropy loss between the recovered and original bits, the proposed framework optimizes the analog and digital processing matrices at the transceiver jointly and implicitly instead of approximating pre-designed label matrices, and its trainability is proved theoretically. It can be also directly applied to orthogonal frequency division multiplexing systems by simply modifying the structure of the training data. Simulation results show the proposed DL-JHPF outperforms the existing hybrid processing schemes and is robust to the mismatched channel state information and channel scenarios with the significantly reduced runtime. INDEX TERMS mmWave massive MIMO, deep learning, hybrid processing design, end-to-end optimization.