At this time, many illegal activities are being been carried out, such as illegal mining, hunting, logging, and forest burning. These things can have a substantial negative impact on the environment. These illegal activities are increasingly rampant because of the limited number of mofficers and the high cost required to monitor them. One possible solution is to create a surveillance system that utilizes artificial intelligence to monitor the area. Unmanned aerial vehicles (UAV) and NVIDIA Jetson modules (general-purpose GPUs) can be inexpensive and efficient because they use few resources. The problem from the object-detection field utilizing the drone’s perspective is that the objects are relatively small compared to the observation space, and there are also illumination and environmental challenges. In this study, we will demonstrate the use of the state-of-the-art object-detection method you only look once (YOLO) v5 using a dataset of visual images taken from a UAV (RGB-image), along with thermal infrared information (TIR), to find poachers. There are seven scenario training methods that we have employed in this research with RGB and thermal infrared data to find the best model that we will deploy on the Jetson Nano module later. The experimental result shows that a new model with pre-trained model transfer learning from the MS COCO dataset can improve YOLOv5 to detect the human–object in the RGBT image dataset.