There are various production items in the industrial internet of things (IIoT) environment, such as pedestrians, robots, automated automated guided vehicles, etc. The practice industrial environment requires simultaneous communication and sensing of production items to achieve intelligent production and control. Thus, sensing methods not only require the integration of communication but also achieve sensing tasks such as recognition and positioning. Compared with traditional sensing media, visible light sensing has the advantages of high-speed communication, high sensing accuracy, and security, low energy consumption, and has become a potential sensing technology. Based on the strong directivity of visible light spatial radiation and the consistency of light intensity and position, this paper proposes a multi-scale visible light sensing-region convolutional neural network (VLS-RCNN) framework based on shadow features for multiple target sensing. The framework enables the recognition and positioning to use shared visible light shadow features to assist each other, and the multi-scale compensation strategy of the shadow region makes the framework more robust. The simulation results show that positioning results in the sensing area improve the recognition accuracy. The recognition results also reduce the positioning error without additional overhead. Therefore, this paper provides a new perspective for the sensing technology in the future IIoT, which should be considered to sense objects of interest by utilizing the inherent characteristics of visible light.