Abstract:The advent of big data has aided understanding of the driving forces of human mobility, which is beneficial for many fields, such as mobility prediction, urban planning, and traffic management. However, the data sources used in many studies, such as mobile phone location and geo-tagged social media data, are sparsely sampled in the temporal scale. An individual's records can be distributed over a few hours a day, or a week, or over just a few hours a month. Thus, the representativeness of sparse mobile phone location data in characterizing human mobility requires analysis before using data to derive human mobility patterns. This paper investigates this important issue through an approach that uses subscriber mobile phone location data collected by a major carrier in Shenzhen, China. A dataset of over 5 million mobile phone subscribers that covers 24 h a day is used as a benchmark to test the representativeness of mobile phone location data on human mobility indicators, such as total travel distance, movement entropy, and radius of gyration. This study divides this dataset by hour, using 2-to 23-h segments to evaluate the representativeness due to the availability of mobile phone location data. The results show that different numbers of hourly segments affect estimations of human mobility indicators and can cause overestimations or underestimations from the individual perspective. On average, the total travel distance and movement entropy tend to be underestimated. The underestimation coefficient results for estimation of total travel distance are approximately linear, declining as the number of time segments increases, and the underestimation coefficient results for estimating movement entropy decline logarithmically as the time segments increase, whereas the radius of gyration tends to be more ambiguous due to the loss of isolated locations. This paper suggests that researchers should carefully interpret results derived from this type of sparse data in the era of big data.