Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts of heterogeneous realworld audio-visual (AV) data, using far less energy compared to best available 'point' neuron-driven DNNs. A novel highlydistributed parallel implementation on a Xilinx UltraScale+ MPSoC device estimates energy savings up to 245759 × 50000 µJ (i.e., 62% less than the baseline model in a semi-supervised learning setup) where a single synapse consumes 8e −5 µJ. In a supervised learning setup, the energy-saving can potentially reach up to 1250x less (per feedforward transmission) than the baseline model. The significantly reduced neural activity in MCC leads to inherently fast learning and resilience against sudden neural damage. This remarkable performance in pilot experiments demonstrates the embodied neuromorphic intelligence of our proposed cooperative L5PC that receives input from diverse neighbouring neurons as context to amplify the transmission of most salient and relevant information for onward transmission, from overwhelmingly large multimodal information utilised at the early stages of on-chip training. Our proposed approach opens new cross-disciplinary avenues for future on-chip DNN training implementations and posits a radical shift in current neuromorphic computing paradigms.