Dispatching, receiving, and transporting goods involve a large amount of manual effort. Within a logistics supply chain, a wide variety of transported goods need to be handled, recognized, and checked at many different points. Effective planning of automated guided vehicle (AGV) transportation can reduce equipment energy consumption and shorten task completion time. As the need for efficient warehouse logistics has increased in manufacturing systems, the use of AGVs has also increased to reduce working time. These processes hold automation potential, which we can exploit by using computer vision techniques. We propose a method for the complete automation of box recognition, covering both the types and quantities of boxes. To do this, an ELAN and GhostConv-based YOLO network (EGCY-Net) is proposed with a Conv-GhostConv Stack (CGStack) module and an ELAN-GhostConv Network (EGCNet). To enhance inter-channel relationships, the CGStack module captures complex patterns and information in the image by using ghost convolution to increase the model inference speed while retaining the ability to capture spatial features. EGCNet is designed and constructed based on ELAN and the CGStack module to capture and utilize hierarchical features efficiently in layer aggregation. Additionally, the proposed methodology involves the creation of a dataset comprising images of boxes taken in warehouse settings. The proposed system is realized on the NVIDIA Jetson Nano platform, using an Arducam IMX477 camera. To evaluate the proposed model, we conducted experiments with our own dataset and compared the results with some state-of-the-art (SOTA) models. The proposed network achieved the highest detection accuracy with the fewest parameters compared to other SOTA models.