Vision-based object detection of PCB (printed circuit board) assembly scenes is essential in accelerating the intelligent production of electronic products. In particular, it is necessary to improve the detection accuracy as much as possible to ensure the quality of assembly products. However, the lack of object detection datasets in PCB assembly scenes is the key to restricting intellectual PCB assembly research development. As an excellent representative of the one-stage object detection model, YOLOv3 (you only look once version 3) mainly relies on placing predefined anchors on the three feature pyramid layers and realizes recognition and positioning using regression. However, the number of anchors distributed in each grid cell of different scale feature layers is usually the same. The ERF (effective receptive field) corresponding to the grid cell at different locations varies. The contradiction between the uniform distribution of fixed-size anchors and the ERF size range in different feature layers will reduce the effectiveness of object detection. Few people use ERF as a standard for assigning anchors to improve detection accuracy. To address this issue, firstly, we constructed a PCB assembly scene object detection dataset, which includes 21 classes of detection objects in three scenes before assembly, during assembly, and after assembly. Secondly, we performed a refined ERF analysis on each grid of the three output layers of YOLOv3, determined the ERF range of each layer, and proposed an anchor allocation rule based on the ERF. Finally, for the small and difficult-to-detect TH (through-holes), we increased the context information and designed improved-ASPP (Atrous spatial pyramid pooling) and channel attention joint module. Through a series of experiments on the object detection dataset of the PCB assembly scene, we found that under the framework of YOLOv3, anchor allocation based on ERF can increase mAP (mean average precision) from 79.32% to 89.86%. At the same time, our proposed method is superior to Faster R-CNN (region convolution neural network), SSD (single shot multibox detector), and YOLOv4 (you only look once version 4) in the balance of high detection accuracy and low computational complexity.