In the digital interface of multimodal audio–visual presentation, the appearance of irrelevant information often brings cognitive interference or even confusion, leading to decision‐making errors when users focus on or manipulate the interface target. However, few studies have explored the brain's inhibition effect and cognitive law evoked by audio–visual interference from the perspective of interface information design. On the basis of Stroop's classic interference task, an experimental paradigm of multimodal audio–visual stimuli to induce event‐related potential (ERP) components was designed for digital interfaces in this study. Combining behavioral measurement and ERP technology, this study discussed the differences in the induced inhibition effects between the two carriers under various audio–visual interferences. The findings demonstrated that all five interference stimuli, based on functional icons and Chinese characters, elicited significant N250 and N400, with a similar time course. Compared with the Chinese character group, the functional icon group elicited more negative activity in the frontal and some parietal‐occipital regions, indicating that the functional icon required more cognitive inhibitory resources to resist interference stimuli. Moreover, the inhibition effect induced by audio–visual interference with the same semantics was significantly lower than that of opposite semantics and even lower than that of single‐sensory interference. The findings offered physiological evidence for the inhibition effect induced by audio–visual semantic interference in digital interfaces and proposed design principles for the interface information of human–machine systems.