Recently, in-sensor computing with individual sensors or multiple connected sensors directly processing information has been proposed to improve energy, area, and time efficiency of artificial intelligence systems. Current investigations mainly focus on a single sensory processing such as auditory, visual, tactile, olfactory, and so on. However, a human perception system can sense and process different types of information with a complex environment and small perceptive field simultaneously. For example, the recognition accuracy of human eyes is highly affected by the environment such as extremely low or high relative humidity (RH). Here, a multi-modal MXene-ZnO memristor that combines visual data sensing, RH sensing, and pre-processing functions to emulate the unique environmental adaptive behavior of the human eye is designed and constructed. The multi-field controlled resistive switching of the MXene-ZnO memristor is originated from the photon-/protons-regulated formation of oxygen vacancies filaments. Finally, in-sensor computing with a MXene-ZnO memristor functioning as both filter to preprocess the information and synapse to implement a weight updating process with different humidity adaptability has been demonstrated. Multimodal in-sensor computing provides the potential to reduce the underlying circuitry complexity of the traditional neuromorphic visual system and contributes to the development of intelligence in device-level implementations.