Visual-based object detection and understanding is an important problem in computer vision and signal processing. Due to their advantages of high mobility and easy deployment, unmanned aerial vehicles (UAV) have become a flexible monitoring platform in recent years. However, visible-light-based methods are often greatly influenced by the environment. As a result, a single type of feature derived from aerial monitoring videos is often insufficient to characterize variations among different abnormal crowd behaviors. To address this, we propose combining two types of features to better represent behavior, namely, multitask cascading CNN (MC-CNN) and multiscale infrared optical flow (MIR-OF), capturing both crowd density and average speed and the appearances of the crowd behaviors, respectively. First, an infrared (IR) camera and Nvidia Jetson TX1 were chosen as an infrared vision system. Since there are no published infrared-based aerial abnormal-behavior datasets, we provide a new infrared aerial dataset named the IR-flying dataset, which includes sample pictures and videos in different scenes of public areas. Second, MC-CNN was used to estimate the crowd density. Third, MIR-OF was designed to characterize the average speed of crowd. Finally, considering two typical abnormal crowd behaviors of crowd aggregating and crowd escaping, the experimental results show that the monitoring UAV system can detect abnormal crowd behaviors in public areas effectively.