Endoscopy is a commonly used clinical method for gastrointestinal disorders. However, the complexity of the gastrointestinal environment can lead to artifacts. Consequently, the artifacts affect the visual perception of images captured during endoscopic examinations. Existing methods to assess image quality with no reference display limitations: some are artifact-specific, while others are poorly interpretable. This study presents an improved cascade region-based convolutional neural network (CNN) for detecting gastrointestinal artifacts to quantitatively assess the quality of endoscopic images. This method detects eight artifacts in endoscopic images and provides their localization, classification, and confidence scores; these scores represent image quality assessment results. The artifact detection component of this method enhances the feature pyramid structure, incorporates the channel attention mechanism into the feature extraction process, and combines shallow and deep features to improve the utilization of spatial information. The detection results are further used for image quality assessment. Experimental results using white light imaging, narrow-band imaging, and iodine-stained images demonstrate that the proposed artifact detection method achieved the highest average precision (62.4% at a 50% IOU threshold). Compared to the typical networks, the accuracy of this algorithm is improved. Furthermore, three clinicians validated that the proposed image quality assessment method based on the object detection of endoscopy artifacts achieves a correlation coefficient of 60.71%.