We have proposed a square wave quadrature amplitude modulation (SW-QAM) scheme for visible light communication (VLC) using an image sensor in our previous work. Here, we propose a robust and unified system by using a neural decoding method. This method offers essential SW-QAM decoding capabilities, such as LED localization, light interference elimination, and unknown parameter estimation, bundled into a single neural network model. This work makes use of a convolutional neural network (CNN) that has a capability in automatic learning of unknown parameters, especially when it deals with images as an input. The neural decoding method can provide good solutions for two difficult conditions that are not covered by our previous SW-QAM scheme: unfixed LED positions and multiple point spread functions (PSFs) of multiple LEDs. Responding to the above solutions, three recent CNN architectures-VGG, ResNet, and DenseNet-are modified to suit our scheme and other two small CNN architectures-VGG-like and MiniDenseNet-are proposed for low computing devices. Our experimental results show that the proposed neural decoding method performs better in terms of error rate than the theoretical decoding, an SW-QAM decoder with a Wiener filter, in different scenarios. Furthermore, we experiment on the problem of moving camera, i.e., the unfixed position of LED points. For this case, a spatial transformer network (STN) layer is added to the neural decoding method for solving the moving camera problem, and the method with the new layer achieves a remarkable result. INDEX TERMS Visible light communication, image sensor communication (ISC), SW-QAM, optical camera communication (OCC), neural decoding, convolutional neural network (CNN), deep learning.