Immersive virtual reality (VR) applications are known to require ultra-high data rate and low-latency for smooth operation. In this paper, we propose a proactive deep-learning aided joint scheduling and content quality adaptation scheme for multi-user VR field of view (FoV) wireless video streaming. Using a real VR head-tracking dataset, a deep recurrent neural network (DRNN) based on gated recurrent units (GRUs) is leveraged to obtain users' upcoming tiled FoV predictions. Subsequently, to exploit a physical layer FoV-centric millimeter wave (mmWave) multicast transmission, users are hierarchically clustered according to their predicted FoV similarity and location. We pose the problem as a quality admission maximization problem under tight latency constraints, and adopt the Lyapunov framework to model the problem of dynamically admitting and scheduling proactive and real-time high definition (HD) video chunk requests corresponding to a tile in the FoV of a cluster user for a given video frame while maintaining the system stability. After decoupling the problem into three subproblems, a matching theory game is proposed to solve the scheduling subproblem by associating chunk requests from clusters of users to mmWave small cell base stations (SBSs) for multicast transmission. Simulation results demonstrate the streaming quality gain and latency reduction brought by using the proposed scheme. It is shown that the prediction of FoV significantly improves the VR streaming experience using proactive scheduling of the video tiles in the users' future FoV. Moreover, multicasting significantly reduces the VR frame delay in a multi-user setting by applying contentreuse in clusters of users with highly overlapping FoVs.