The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people's mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals. main contributions in this paper are the following:1 We study the problem of matchability using two datasets which correspond to a significant sample of the population in the area considered. To our best knowledge, this is the first attempt to estimate the potential for merging datasets on this scale. This presents a realistic scenario in terms of computational complexity and data density, i.e. the number of false positives is non-negligible.2 We evaluate and develop a matching methodology which can handle data of this size; a main objective is to be able to perform the matching without having to evaluate a similarity metric among any pair of users which would present prohibitively high computational complexity. We make our implementation available to the research community as open-source software which performs the search efficiently on datasets consisting of few hundred million records of several million users each.3 We develop an empirical framework for establishing the matchability of the datasets and use it to evaluate the expected success rate of the matching methodology to estimate the required data collection period for successful matching of users given their activity. This work is extensible to more complex search and matching strategies as well.