Colorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists. The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification. We have also compared the performance of eight state-of-the-art deep learning-based object detection models. The results demonstrate that deep CNN models are promising in CRC screening. This work can serve as a baseline for future research in polyp detection and classification.
ObjectiveTo localize structural laryngeal lesions within digital flexible laryngoscopic images and to classify them as benign or suspicious for malignancy using state‐of‐the‐art computer vision detection models.Study DesignCross‐sectional diagnostic studySettingTertiary care voice clinicMethodsDigital stroboscopic videos, demographic and clinical data were collected from patients evaluated for a structural laryngeal lesion. Laryngoscopic images were extracted from videos and manually labeled with bounding boxes encompassing the lesion. Four detection models were employed to simultaneously localize and classify structural laryngeal lesions in laryngoscopic images. Classification accuracy, intersection over union (IoU) and mean average precision (mAP) were evaluated as measures of classification, localization, and overall performance, respectively.ResultsIn total, 8,172 images from 147 patients were included in the laryngeal image dataset. Classification accuracy was 88.5 for individual laryngeal images and increased to 92.0 when all images belonging to the same sequence (video) were considered. Mean average precision across all four detection models was 50.1 using an IoU threshold of 0.5 to determine successful localization.ConclusionResults of this study showed that deep neural network‐based detection models trained using a labeled dataset of digital laryngeal images have the potential to classify structural laryngeal lesions as benign or suspicious for malignancy and to localize them within an image. This approach provides valuable insight into which part of the image was used by the model to determine a diagnosis, allowing clinicians to independently evaluate models' predictions.
Label assignment plays a significant role in modern object detection models. Detection models may yield totally different performances with different label assignment strategies. For anchor-based detection models, the IoU (Intersection over Union) threshold between the anchors and their corresponding ground truth bounding boxes is the key element since the positive samples and negative samples are divided by the IoU threshold. Early object detectors simply utilize the fixed threshold for all training samples, while recent detection algorithms focus on adaptive thresholds based on the distribution of the IoUs to the ground truth boxes. In this paper, we introduce a simple while effective approach to perform label assignment dynamically based on the training status with predictions. By introducing the predictions in label assignment, more high-quality samples with higher IoUs to the ground truth objects are selected as the positive samples, which could reduce the discrepancy between the classification scores and the IoU scores, and generate more high-quality boundary boxes. Our approach shows improvements in the performance of the detection models with the adaptive label assignment algorithm and lower bounding box losses for those positive samples, indicating more samples with higher-quality predicted boxes are selected as positives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.