Aerial image-based target object detection has several glitches such as low accuracy in multi-scale target detection locations, slow detection, missed targets, and misprediction of targets. To solve this problem, this paper proposes an improved You Only Look Once (YOLO) algorithm from the viewpoint of model efficiency using target box dimension clustering, classification of the pre-trained network, multi-scale detection training, and changing the screening rules of the candidate box. This modified approach has the potential to be better adapted to the positioning task. The aerial image of the unmanned aerial vehicle (UAV) can be positioned to the target area in real-time, and the projection relation can convert the latitude and longitude of the UAV. The results proved to be more effective; notably, the average accuracy of the detection network in the aerial image of the target area detection tasks increased to 79.5%. The aerial images containing the target area are considered to experiment with the flight simulation to verify its network positioning accuracy rate and were found to be greater than 84%. This proposed model can be effectively used for real-time target detection for multi-scale targets with reduced misprediction rate due to its superior accuracy.
<p><span>Automatic emotion recognition is active research in analyzing human’s emotional state over the past decades. It is still a challenging task in computer vision and artificial intelligence due to its high intra-class variation. The main advantage of emotion recognition is that a person’s emotion can be recognized even if he is extreme away from the surveillance monitoring since the camera is far away from the human; it is challenging to identify the emotion with facial expression alone. This scenario works better by adding visual body clues (facial actions, hand posture, body gestures). The body posture can powerfully convey the emotional state of a person in this scenario. This paper analyses the frontal view of human body movements, visual expressions, and body gestures to identify the various emotions. Initially, we extract the motion information of the body gesture using dense optical flow models. Later the high-level motion feature frames are transferred to the pre-trained convolutional neural network (CNN) models to recognize the 17 various emotions in Geneva multimodal emotion portrayals (GEMEP) dataset. In the experimental results, AlexNet exhibits the architecture's effectiveness with an overall accuracy rate of 96.63% for the GEMEP dataset is better than raw frames and 94% for visual geometry group-19 VGG-19, and 93.35% for VGG-16 respectively. This shows that the dense optical flow method performs well using transfer learning for recognizing emotions.</span></p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.