The advent of big data and the Internet of Things has created urgent demands for in‐sensor computing hardware with multimodal perception that can effectively resolve the inefficiency, high latency, and excessive energy consumption challenges faced by conventional sensory systems. Here, a simple‐structured optoelectronic synaptic device of In2O3·SnO2/Nb:SrTiO3 (ITO/NSTO) heterostructure is proposed, which vividly demonstrates in‐sensor computing and multimodal perception capabilities. First, with the ingenious synaptic responses of the device under both optical and electrical stimuli, a multimodal in‐sensor neuromorphic computing system capable of concurrently perceiving and processing visual and auditory information is constructed. Using this multimodal system to perform a human emotion recognition task, the misjudgment arising from single‐modal cognition can thereby be effectively avoided. Second, utilizing the integrated sensing and processing functions, along with the dynamic memories of the device array, a neuromorphic vision system is implemented for real‐time monitoring of moving vehicles, which displays high recognition efficiency and accuracy. This research not only provides an optoelectronic synaptic device that is low‐cost and easy to mass‐produce, but also paves the way for next‐generation in‐sensor computing that can efficiently process multimodal signals.