Threats posed by drones urge defence sectors worldwide to develop drone detection systems. Visible-light and infrared cameras complement other sensors in detecting and identifying drones. Application of Convolutional Neural Networks (CNNs), such as the You Only Look Once (YOLO) algorithm, are known to help detect drones in video footage captured by the cameras quickly, and to robustly differentiate drones from other flying objects such as birds, thus avoiding false positives. However, using still video frames for training the CNN may lead to low drone-background contrast when it is flying in front of clutter, and omission of useful temporal data such as the flight trajectory. This deteriorates the drone detection performance, especially when the distance to the target increases. This work proposes to pre-process the video frames using a Bio-Inspired Vision (BIV) model of insects, and to concatenate the pre-processed video frame with the still frame as input for the CNN. The BIV model uses information from preceding frames to enhance the moving target-to-background contrast and embody the target’s recent trajectory in the input frames. An open benchmark dataset containing infrared videos of small drones (< 25 kg) and other flying objects is used to train and test the proposed methodology. Results show that, at a high sensor-to-target distance, the YOLO algorithms trained on BIV-processed frames and concatenation of the BIV-processed frames with still frames increase the Average Precision (AP) to 0.92 and 0.88, respectively, compared to 0.83 when it is trained on still frames alone.