In many modern industries the production lines are very fast-paced environments with repetitive and intricate motions where humans and machines often co-exist. Manufacturers are always looking for ways to minimize breakdowns and failures to improve productivity and efficiency. This work is an outcome of the collaborative R&D project VIEXPAND AI -a real-time AI-boosted solution that complements and expands human supervision with 24/7 'smart eyes' in a container glass industry application. The goal is to reduce production downtime, accidents, waste of raw materials and energy, as well as improve the industrial work conditions. To accomplish this, we propose an architecture where AI methods and techniques are implemented on the edge, to allow real-time supervision of multiple sites with centralized remote monitoring. FPGA System-on-Chip (SoC) devices are used to implement the video processing, multiplexing and encoding/decoding stages, as well as the AI engine used for object detection and classification. This heterogeneous technology allows us to distribute processing tasks over different hardware modules available on-chip (the multiprocessor unit, hard-cores and soft-cores), thus enabling real-time operation. This paper evaluates the use of YOLOX models in a Xilinx Zynq®UltraScale+ TM Multiprocessor System-on-Chip (MPSoC) Deep-Learning Processing System (DPU). It presents a study on the performance of the models when trained with different input sizes and custom datasets, obtained on the factory floor. The impact of different design choices on performance metrics is reported and discussed.