Traffic data are the basis of traffic control, planning, management, and other implementations. Incomplete traffic data that are not conducive to all aspects of transport research and related activities can have adverse effects such as traffic status identification error and poor control performance. For intelligent transportation systems, the data recovery strategy has become increasingly important since the application of the traffic system relies on the traffic data quality. In this study, a bidirectional k-nearest neighbor searching strategy was constructed for effectively detecting and recovering abnormal data considering the symmetric time network and the correlation of the traffic data in time dimension. Moreover, the state vector of the proposed bidirectional searching strategy was designed based the bidirectional retrieval for enhancing the accuracy. In addition, the proposed bidirectional searching strategy shows significantly more accuracy compared to those of the previous methods.Symmetry 2019, 11, 815 2 of 18 and model parameter selection can be shown. In Section 4, basic KNN approach will be introduced. In Section 5, a novel bidirectional data recovery approach is proposed, and the parameter setting process is presented. Section 6 discusses the experimental design and the results. Section 7 concludes the paper with a summary of bidirectional searching strategy and gives the suggestions for future work.
Literature ReviewSeveral works focusing on recovering the abnormal data response to the data quality control strategy. These studies examine the effect of abnormal data and research the data recovery approach based on the historical aiming to improve data quality. Pushkar et al. [17] applied the catastrophe theory to establish the three-parameter sudden change surface of traffic flow to recover data and then proposed the speed estimation method. A nearest-neighbor imputation algorithm was developed and applied to interpolate the missing data based on the average value of historical data at the same time interval [18]. In addition, the factor approach estimated the missing data by using the mean value of the key factors selected from the historical set [19]. Troyanskaya et al. [20] investigated automated methods for estimating missing data to minimize the effect of incomplete data sets on analyses. Smith et al. [21] performed a preliminary analysis of several heuristic and statistical imputation techniques and declared the statistical techniques are more accurate. Chen et al. [22] proposed a linear regression algorithm to impute missing or bad traffic flow data and occupancy data using neighboring sensors data. Abdella et al. [23] proposed an integrated method combining the genetic algorithms with neural networks aiming at seeking the approximating missing data in a database. Tang et al.[24] developed a hybrid approach integrating Fuzzy C-Means-based (FCM) imputation method with a genetic algorithm (GA) to estimate the missing traffic volume data based on inductance loop sensor outputs. This approach ...