“…Researchers have successfully applied CNN-based architectures for many visual tasks such as people detection and tracking [126], [127], [128], pose estimation [129], [130], [131], [132], [133], [134], action recognition [79], [135], [136], [137], [138], [139], [140], [141], [142], [143], [144], [145], [146], [147], [148], [149], [150], [151], [152], [153], [154], [155], [156], [157], [158], event detection and crowded scene understanding [159], [160], [161], [162]. Early work on applying CNNs was made in 1995 by Nowlan et al [129] for hand tracking and recognizing.…”