Speech recognition becomes increasingly important in the modern society, especially for human–machine interactions, but its deployment is still severely thwarted by the struggle of machines to recognize voiced commands in challenging real‐life settings: oftentimes, ambient noise drowns the acoustic sound signals, and walls, face masks or other obstacles hide the mouth motion from optical sensors. To address these formidable challenges, an experimental prototype of a microwave speech recognizer empowered by programmable metasurface is presented here that can remotely recognize human voice commands and speaker identities even in noisy environments and if the speaker's mouth is hidden behind a wall or face mask. The programmable metasurface is the pivotal hardware ingredient of the system because its large aperture and huge number of degrees of freedom allows the system to perform a complex sequence of sensing tasks, orchestrated by artificial‐intelligence tools. Relying solely on microwave data, the system avoids visual privacy infringements. The developed microwave speech recognizer can enable privacy‐respecting voice‐commanded human–machine interactions is experimentally demonstrated in many important but to‐date inaccessible application scenarios. The presented strategy will unlock new possibilities and have expectations for future smart homes, ambient‐assisted health monitoring, as well as intelligent surveillance and security.