Generalised wide are search and surveillance is a common-place tasking for multi-sensory equipped autonomous systems. Here we present on a key supporting topic to this task -the automatic interpretation, fusion and detected target reporting from multi-modal sensor information received from multiple autonomous platforms deployed for wide-area environment search. We detail the realization of a real-time methodology for the automated detection of people and vehicles using combined visible-band (EO), thermal-band (IR) and radar sensing from a deployed network of multiple autonomous platforms (ground and aerial). This facilities real-time target detection, reported with varying levels of confidence, using information from both multiple sensors and multiple sensor platforms to provide environment-wide situational awareness. A range of automatic classification approaches are proposed, driven by underlying machine learning techniques, that facilitate the automatic detection of either target type with cross-modal target confirmation. Extended results are presented that show both the detection of people and vehicles under varying conditions in both isolated rural and cluttered urban environments with minimal false positive detection. Performance evaluation is presented at an episodic level with individual classifiers optimized for maximal each object of interest (vehicle/person) detection over a given search path/pattern of the environment, across all sensors and modalities, rather than on a per sensor sample basis. Episodic target detection, evaluated over a number of wide-area environment search and reporting tasks, generally exceeds 90%+ for the targets considered here.