Pedestrian detection has never been an easy task for computer vision and the automotive industry. Systems like the advanced driver-assistance system (ADAS) highly rely on far-infrared (FIR) data captured to detect pedestrians at nighttime. The recent development of deep learning-based detectors has proven the excellent results of pedestrian detection in perfect weather conditions. However, it is still unknown what the performance in adverse weather conditions is. In this paper, we introduce a 16-bit thermal data dataset called ZUT (Zachodniopomorski Uniwersytet Technologiczny) as having the widest variety of fine-grained annotated images captured in the four biggest European Union countries captured during severe weather conditions. We also provide a synchronized Controller Area Network (CAN bus) data, including driving speed, brake pedal status, and outside temperature for future ADAS system development. Furthermore, we have tested and provided 16-bit depth modifications for the YOLOv3 deep neural network (DNN) based detector, reaching a mean Average Precision (mAP) up to 89.1%. The ZUT dataset is published and publicly available at IEEE Dataport and Github.
The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.