Adaptive optics (AO) systems are trending towards miniaturization and cost reduction, with wavefront sensorless adaptive optics systems (WFSless AOSs) emerging as a field of interest due to their simple structure and application versatility. The advent of deep learning has propelled the use of convolutional neural networks (CNNs) to extract aberration information from CCD images. Nevertheless, CNNs often fail to focus on the regions of images where effective information is concentrated, which limits the accuracy in aberration extraction. This paper introduces a novel Swin-UNet-based model for WFSless AOSs based on point source that employs an attention mechanism to target relevant areas within CCD light intensity images, thus addressing CNN shortcomings. Furthermore, the proposed model fuses in-focus and out-of- focus image information to directly output the reconstructed wavefront image, enhancing the overall wavefront reconstruction process. Our simulations across various D/r0 ratios reveal significant improvements with the Swin-UNet-based model: a reduction in rms wavefront error from 0.0219 to 0.0061 wavelengths at D/r0=1, from 0.0806 to 0.03825 at D/r0=6, and from 0.1241 to 0.0991 at D/r0=11. Correspondingly, the Strehl ratio improved from 0.9950 to 0.9988, from 0.8380 to 0.9567, and from 0.6814 to 0.7522, indicating enhanced image quality post-correction. Compared with existing CNN-based technology, our Swin-UNet approach can more effectively concentrate on the relevant areas of the image and mitigate the influence of invalid regions on the analysis, thereby substantially improving the effectiveness and robustness of the correction performance in WFSless AOSs.