Visible light optical coherence tomography (VIS-OCT) is an emerging imaging modality that uses shorter wavelength in visible light range than conventional near infrared (NIR) light. It provides one-micron level axial resolution to improve image contrast to better separate stratified retinal layers, as well as provides microvascular oximetry with spatio-spectral analysis. However, due to the practical limitation of laser safety and comfort, the permissible illumination power is much lower than NIR OCT which can be challenging to obtain high quality VIS-OCT images and subsequent image analysis particularly in pathological eyes. Therefore, improving VIS-OCT image quality by denoising is an essential step in the overall workflow in VIS-OCT clinical applications. In this paper, we provide the first VIS-OCT retinal image dataset from normal eyes, including retinal layer annotation and noisy-clean image pairs. We propose an efficient co-learning deep learning framework for noisy-input segmentation embedded with a self-supervised denoising process. The same neural network performed both denoising and segmentation tasks simultaneously. The task performance is benchmarked qualitatively and quantitatively. The significant improvement of segmentation (2% higher Dice coefficient compared to segmentation-only process) for certain layers is observed when available annotation drops to 25%, indicating a potential angle for annotation-efficient training.