Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lowerdimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.