Objective. Sparse-view computed tomography (SVCT), which can reduce the radiation doses administered to patients and hasten data acquisition, has become an area of particular interest to researchers. Most existing deep learning-based image reconstruction methods are based on convolutional neural networks (CNNs). Due to the locality of convolution and continuous sampling operations, existing approaches cannot fully model global context feature dependencies, which makes the CNN-based approaches less efficient in modeling the CT images with various structural information. Approach. To overcome the above challenges, this paper develops a novel multi-domain optimization network based on convolution and swin transformer (MDST). MDST uses swin transformer block (STB) as the main building block in both projection (residual) domain and image (residual) domain sub-networks, which models global and local features of the projections and reconstructed images. MDST consists of two modules for initial reconstruction and residual-assisted reconstruction, respectively. The sparse sinogram is first expanded in the initial reconstruction module with a projection domain sub-network. Then, the sparse-view artifacts are effectively suppressed by an image domain sub-network. Finally, the residual assisted reconstruction module to correct the inconsistency of the initial reconstruction, further preserving image details. Main results. Extensive experiments on CT lymph node datasets and real walnut datasets show that MDST can effectively alleviate the loss of fine details caused by information attenuation and improve the reconstruction quality of medical images. Significance. MDST network is robust and can effectively reconstruct images with different noise level projections. Different from the current prevalent CNN-based networks, MDST uses transformer as the main backbone, which proves the potential of transformer in SVCT reconstruction.