Highly sensitive, source-tracking acoustic sensing is essential for effective and natural human-machine interaction based on voice. It is a known challenge to omnidirectionally track sound sources under a hypersensitive rate with low noise interference using a compact sensor. Here, we present a unibody acoustic metamaterial spherical shell with equidistant defected piezoelectric cavities, referred to as the metasphere beamforming acoustic sensor (MBAS). It demonstrates a wave-confining capability and low self-noise, simultaneously achieving an outstanding intrinsic signal-to-noise ratio (72 dB) and an ultrahigh sensitivity (137 mV
pp
/Pa or −26.3 dBV), with a range spanning the daily phonetic frequencies (0 to 1500 Hz) and omnidirectional beamforming for the perception and spatial filtering of sound sources. Moreover, the MBAS-based auditory system is shown for high-performance audio cloning, source localization, and speech recognition in a noisy environment without any signal enhancement, revealing its promising applications in various voice interaction systems.