We present an efficient approach for monocular 4D facial avatar reconstruction using a dynamic neural radiance field (NeRF). Over the years, NeRFs have been popular methods for 3D scene representation, but lack computational efficiency and controllabilty, thus it is impractical for real world application such as AR/VR, teleconferencing, and immersive experiences. Recent the introduction of grid-based encoding by InstantNGP has enabled the rendering process of NeRF much faster, but it is limited to static 3D scenes. To address the issues, we focus on developing a novel dynamic NeRF that allows explicit control over pose and facial expression, while keeping the computational efficiency. By leveraging a low-dimensional basis from the morphable model (3DMM) with elaborately designed spatial encoding branch and ambient encoding branch, we condition a dynamic radiance field in an ambient space, improving controllability and visual quality. Our model achieves rendering speeds approximately 30x faster at training and 100x faster at inference than the baseline (NeRFace), enabling practical approaches for real world applications. Through qualitative and quantitative experiments, we demonstrate the effectiveness of our approach. The dynamic NeRF exhibits superior controllability, enhanced 3D consistency, and improved visual quality. Our efficient model opens new possibilities for real-time applications, revolutionizing AR/VR and teleconferencing experiences.