Deep learning techniques offer improvements in computer-aided diagnosis systems. However, acquiring image domain annotations is challenging due to the knowledge and commitment required of expert pathologists. Pathologists often identify regions in whole slide images with diagnostic relevance rather than examining the entire slide, with a positive correlation between the time spent on these critical image regions and diagnostic accuracy. In this paper, a heatmap is generated to represent pathologists’ viewing patterns during diagnosis and used to guide a deep learning architecture during training. The proposed system outperforms traditional approaches based on color and texture image characteristics, integrating pathologists’ domain expertise to enhance region of interest detection without needing individual case annotations. Evaluating our best model, a U-Net model with a pre-trained ResNet-18 encoder, on a skin biopsy whole slide image dataset for melanoma diagnosis, shows its potential in detecting regions of interest, surpassing conventional methods with an increase of 20%, 11%, 22%, and 12% in precision, recall, F1-score, and Intersection over Union, respectively. In a clinical evaluation, three dermatopathologists agreed on the model’s effectiveness in replicating pathologists’ diagnostic viewing behavior and accurately identifying critical regions. Finally, our study demonstrates that incorporating heatmaps as supplementary signals can enhance the performance of computer-aided diagnosis systems. Without the availability of eye tracking data, identifying precise focus areas is challenging, but our approach shows promise in assisting pathologists in improving diagnostic accuracy and efficiency, streamlining annotation processes, and aiding the training of new pathologists.