A 1 millisecond (1‐ms) vision system that guarantees high efficiency and timely response for tomato defect detection is essential for factory automation. Because of various defect appearances, recently many existing researches focus on CNN based defect detection, but few of them attempt to reach high processing speed to adapt to the factorial assembly line. This paper proposes a global to multi‐scale local based parallel architecture with hardwired CNN for tomato defect detection. This architecture breaks down image‐wise detection into pixel‐wise localization and block‐wise classification. The pixel‐wise localization utilizes tomato‐aware information as constraints for localization performance. The block‐wise classification uses a fully pipelined network structure to obtain the classification result for each block as the pixel stream moves through the network. The classification network has a six‐layer lightweight network structure with quantization for hardwired type implementation on FPGA. The experiment results show that the proposed architecture processes 1000 FPS images with 0.9476 ms/frame delay. And for detection performance, this architecture keeps at 80.18%, only 1.31% lower than ResNet50 based detection system.