In the semiconductor industry, automated visual inspection aims to improve the detection and recognition of manufacturing defects by leveraging the power of artificial intelligence and computer vision systems, enabling manufacturers to profit from an increased yield and reduced manufacturing costs. Previous domain-specific contributions often utilized classical computer vision approaches, whereas more novel systems deploy deep learning based ones. However, a persistent problem in the domain stems from the recognition of very small defect patterns which are often in the size of only a few $$\mu $$
μ
m and pixels within vast amounts of high-resolution imagery. While these defect patterns occur on the significantly larger wafer surface, classical machine and deep learning solutions have problems in dealing with the complexity of this challenge. This contribution introduces a novel hybrid multistage system of stacked deep neural networks (SH-DNN) which allows the localization of the finest structures within pixel size via a classical computer vision pipeline, while the classification process is realized by deep neural networks. The proposed system draws the focus over the level of detail from its structures to more task-relevant areas of interest. As the created test environment shows, our SH-DNN-based multistage system surpasses current approaches of learning-based automated visual inspection. The system reaches a performance (F1-score) of up to 99.5%, corresponding to a relative improvement of the system’s fault detection capabilities by 8.6-fold. Moreover, by specifically selecting models for the given manufacturing chain, runtime constraints are satisfied while improving the detection capabilities of currently deployed approaches.