Neovascular age-related macular degeneration (nAMD) is one of the major causes of irreversible blindness and is characterized by accumulations of different fluids inside the retina. An early detection and activity monitoring of predominately three types of fluids, namely intra-retinal fluid (IRF), sub-retinal fluid (SRF), and pigment epithelium detachment (PED), is critical for a successful treatment. Spectral-domain optical coherence tomography (SD-OCT) revolutionized nAMD treatment by providing cross-sectional, high-resolution images of the retina. Automatic segmentation and quantification of IRF, SRF, and PED in SD-OCT images can be extremely useful for clinical decision-making. Despite the use of state-of-the-art convolutional neural network (CNN)-based methods, the task remains challenging due to relevant variations in the location, size, shape, and texture of the fluids. This work is the first to adopt a transformer-based method to automatically segment retinal fluid from SD-OCT images and qualitatively and quantitatively evaluate its performance against CNN-based methods. The method combines the efficient long-range feature extraction and aggregation capabilities of Vision Transformers (ViTs) with data-efficient training of CNNs. The proposed method was tested on a private dataset containing 3842 2-dimensional SD-OCT retina images, manually labeled by experts of the Franziskus-Eye-Hospital. While one of the competitors presents a better performance in terms of Dice score, the proposed method is significantly less computationally expensive. Thus, future research will focus on the proposed network's architecture to increase its segmentation performance while maintaining its computational efficiency.