The Non-Line-of-Sight (NLOS) environment significantly reduces the accuracy of Ultra-Wideband (UWB) ranging and positioning in the UWB positioning system based on time of arrival. The mainstream method is to identify the Line-of-Sight (LOS) and NLOS environment through machine learning and make corresponding corrections to improve the accuracy of the positioning system. However, the existing research and application of machine learning methods do not fully consider the situation that the test set exceeds the training set, which is inconsistent with the actual scenario. Firstly, in this paper, we show through experimental results that when the test set and the training set are independent, different training set acquisition methods will directly affect the accuracy of machine learning. The denser the training set, the higher the precision and the greater the workload. Therefore, aiming at how to choose the training set collection scheme, we put forward an evaluation index to determine the best training set collection scheme. Commonly used machine learning algorithms, such as Random Forest (RF), and XGBoost, are used here to explore the parameter configuration of this index. The research results show that when using machine learning method to identify ultra-wideband NLOS scenes, the acquisition step size of training set is 1.6 meters, which can make the identification accuracy high and the acquisition workload small.