In human-robot collaboration, perception plays a major role in enabling the robot to understand the surrounding environment and the position of humans inside the working area, which represents a key element for an effective and safe collaboration. Human pose estimators based on skeletal models are among the most popular approaches to monitor the position of humans around the robot, but they do not take into account information such as the body volume, needed by the robot for effective collision avoidance. In this paper, we propose a novel 3D human representation derived from body parts segmentation which combines high-level semantic information (i.e., human body parts) and volume information. To compute such body parts segmentation, also known as human parsing in the literature, we propose a multi-view system based on a camera network. People body parts are segmented in the frames acquired by each camera, projected into 3D world coordinates, and then aggregated to build a 3D representation of the human that is robust to occlusions. A further step of 3D data filtering has been implemented to improve robustness to outliers and segmentation accuracy. The proposed multi-view human parsing approach was tested in a real environment and its performance measured in terms of global and class accuracy on a dedicated dataset, acquired to thoroughly test the system under various conditions. The experimental results demonstrated the performance improvements that can be achieved thanks to the proposed multi-view approach.