BackgroundLesion detection is one of the most important clinical tasks in positron emission tomography (PET) for oncology. An anthropomorphic model observer (MO) designed to replicate human observers (HOs) in a detection task is an important tool for assessing task‐based image quality. The channelized Hotelling observer (CHO) has been the most popular anthropomorphic MO. Recently, deep learning MOs (DLMOs), mostly based on convolutional neural networks (CNNs), have been investigated for various imaging modalities. However, there have been few studies on DLMOs for PET.PurposeThe goal of the study is to investigate whether DLMOs can predict HOs better than conventional MOs such as CHO in a two‐alternative forced‐choice (2AFC) detection task using PET images with real anatomical variability.MethodsTwo types of DLMOs were implemented: (1) CNN DLMO, and (2) CNN‐SwinT DLMO that combines CNN and Swin Transformer (SwinT) encoders. Lesion‐absent PET images were reconstructed from clinical data, and lesion‐present images were reconstructed with adding simulated lesion sinogram data. Lesion‐present and lesion‐absent PET image pairs were labeled by eight HOs consisting of four radiologists and four image scientists in a 2AFC detection task. In total, 2268 pairs of lesion‐present and lesion‐absent images were used for training, 324 pairs for validation, and 324 pairs for test. CNN DLMO, CNN‐SwinT DLMO, CHO with internal noise, and non‐prewhitening matched filter (NPWMF) were compared in the same train‐test paradigm. For comparison, six quantitative metrics including prediction accuracy, mean squared errors (MSEs) and correlation coefficients, which measure how well a MO predicts HOs, were calculated in a 9‐fold cross‐validation experiment.ResultsIn terms of the accuracy and MSE metrics, CNN DLMO and CNN‐SwinT DLMO showed better performance than CHO and NPWMF, and CNN‐SwinT DLMO showed the best performance among the MOs evaluated.ConclusionsDLMO can predict HOs more accurately than conventional MOs such as CHO in PET lesion detection. Combining SwinT and CNN encoders can improve the DLMO prediction performance compared to using CNN only.