Small polyp region detection in wireless capsule endoscopy (WCE) images is a challenging task in computer vision owing to two major problems: its variation in terms of shape, texture, and size, and the low illumination in the gastrointestinal tract. This study proposes a multiscale pyramidal fusion single-shot multibox detector network (MP-FSSD) to detect small polyp regions in WCE or colonoscopy frames, or both, with respect to the precision-vs-speed trade-off as the base architecture. We investigated deep transfer learning by transferring knowledge to polyp images, thereby enabling the extraction of highly representative features and contextual information from the FSSD. First, an edge-pooling layer was embedded in the shallow part of the network. Subsequently, the feature maps from different layers and scales were transformed to match their sizes. A concatenation module was introduced to integrate the feature maps from different layers, which were delivered to the next layer, followed by downsampling blocks to generate new pyramidal layers. Finally, the feature maps were fed to the multibox detectors to predict the final detection results. Experimentally, we maintained the same hyperparameters for both datasets for a fair comparison. The proposed MP-FSSD network exceeded FSSD by 3.62% in terms of mean average precision (mAP). The testing speed of 62.5 FPS is superior to that of the competitor detection methods. The proposal demonstrates that deep learning has much room for development in the field of gastrointestinal image detection.INDEX TERMS Deep transfer learning, edge pooling, feature maps fusion, image augmentation, polyp, single-shot multibox detector (SSD), wireless capsule endoscopy images (WCE).