Voice- and gaze-based hands-free input are increasingly used in human-machine interaction. Attempts to combine them into a hybrid technology typically employ the voice channel as an information-rich channel. Voice seems to be “overqualified” to serve simply as a substitute of a computer mouse click, to confirm selections made by gaze. It could be expected that the user would feel discomfort if they had to frequently make “clicks” using their voice, or easily get bored, which also could lead to low performance. To test this, we asked 23 healthy participants to select moving objects with smooth pursuit eye movements. Manual confirmation of selection was faster and rated as more convenient than voice-based confirmation. However, the difference was not high, especially when voice was used to pronounce objects’ numbers (speech recognition was not applied): Score of convenience (M ± SD) was 9.2 ± 1.1 for manual and 8.0 ± 2.1 for voice confirmation, and time spent per object was 1269 ± 265 ms and 1626 ± 331 ms, respectively. We conclude that “voice-as-click” can be used to confirm selection in gaze-based interaction with computers as a substitute for the computer mouse click when manual confirmation cannot be used.