Eye tracking has evolved as a promising hands-free interaction mechanism to support people with disabilities. However, its adoption as a control mechanism in the gaming environment is constrained due to erroneous recognition of user intention and commands. Previous studies have suggested combining eye gaze with other modalities like voice input for improved interaction experience. However, speech recognition latency and accuracy is a major bottleneck, and the use of dictated verbal commands can disrupt the flow in gaming environment. Furthermore, several people with physical disabilities also suffer from speech impairments to utter precise verbal voice commands. In this work, we introduce nonverbal voice interaction (NVVI) to synchronize with gaze for an intuitive hands-free gaming experience. We propose gaze and NVVI (e.g., humming) for a spatio-temporal interaction applicable to several modern gaming apps, and developed 'All Birds Must Fly' as a representative app. In the experiment, we first compared the gameplay experience of gaze and NVVI (GV) with the conventional mouse and keyboard (MK) in a study with 15 non-disabled participants. The participants could effectively control the game environment with GV (expectedly a bit slower than MK). More importantly, they found GV more engaging, fun, and enjoyable. In a second study with 10 participants, we successfully validated the feasibility of GV with a target user group of people with disabilities.