Background: Digital pathology has significantly impacted the cancer diagnosis field, with Content-Based Medical Image Retrieval (CBMIR) emerging as a powerful tool for analyzing histopathological Whole Slide Images (WSIs). CBMIR allows users to search a database for similar content to a query, providing pathologists with access to collections of cases with comparable features. This can improve the reliability of diagnostic references and help in making more accurate and timely diagnoses. Objective: In 2020, the Global Cancer Observatory (GCO) reported that breast cancer is the most prevalent cancer type in both men and women, accounting for 11.7% of all cases, while prostate cancer is the second most common cancer type in men, comprising 14.1% of cases. The aim of the proposed Unsupervised CBMIR (UCBMIR) is to replicate the traditional cancer diagnosis workflow and provide a dependable method for supporting pathologists when making diagnostic conclusions based on WSIs. By reducing the workload of pathologists, this approach could potentially enhance the accuracy and efficiency of cancer diagnosis.
Method and results: The study presents an innovative approach to address the problem of the lack of labeled histopathological images in CBMIR. A customized unsupervised Convolutional Auto Encoder (CAE) was developed to extract 200 features per image, which were then used by the search engine component. The proposed UCBMIR was evaluated using two widely used numerical techniques in CBMIR and visual evaluation, and compared with a classifier to determine if retrieved images belong to the same cancer type as the query. The validation process was conducted using three distinct data sets, with an external evaluation to demonstrate its effectiveness. The UCBMIR outperformed previous studies, achieving a top 5 recall of 99% and 80% on BreaKHis and SICAPv2, respectively, using the first evaluation technique. Using the second evaluation technique, UCBMIR achieved precision rates of 91% and 70% for BreaKHis and SICAPv2, respectively. Moreover, the UCBMIR was able to identify various patterns in patches and achieved an accuracy of 81% in the top 5 when tested on an external image from Arvaniti, having been trained using SICAPv2 with the second evaluation technique.