Sustained attention (SA) is a critical cognitive ability that emerges in infancy. The recent development of wearable technology for infants enables the collection of large-scale multimodal data in the natural environment, including physiological signals. To capitalize on these new technologies, psychologists need methods to efficiently extract valid and robust SA measures from large datasets. In this study, we present an innovative automatic sustained attention prediction (ASAP) method that harnesses electrocardiogram (ECG) and accelerometer (Acc) signals recorded with wearable sensors from 75 infants (6-, 9-, 12-, 24- and 36-months). Infants undertook various naturalistic tasks similar to those encountered in their natural environment, including free play with their caregivers. Annotated SA was validated by fixation signals from eye-tracking. ASAP was trained on temporal and spectral features derived from the ECG and Acc signals to detect attention periods, and tested against human-coded SA. ASAP's performance is similar across all age groups, demonstrating its suitability for studying development. We also investigated the relationship between attention periods and low-level perceptual features (visual saliency, visual clutter) extracted from the egocentric videos recorded during caregiver-infant free play. Saliency increased during attention vs inattention periods and decreased with age for attention (but not inattention) periods. Crucially, there was no observable difference in results from ASAP attention detection relative to the human-coded attention. Our results demonstrate that ASAP is a powerful tool for detecting infant SA elicited in natural environments. Alongside the available wearable sensors, ASAP provides unprecedented opportunities for studying infant development in the "wild".