Synthetic population is used in many transport models ranging from trip-based, hybrid trip, tour-based, and activity-based models. As mobility decisions depend on both individuals’ characteristics and family situation, generating a two-layered population that takes into account not only the individual level but also household level is essential. In the literature, three main categories of methods for two-layered population generation have been proposed. These categories are synthetic reconstruction (SR), combinatorial optimization (CO), and statistical learning (SL). SR and CO methods produce synthetic populations by means of replicating individuals, whereas SL methods generate a population following a joint probability estimation. However, selecting a generation process is not straightforward as it depends on input data and synthetic population characteristics. To the best of our knowledge, no clear methodology for selecting between these methods exists. The main objectives of this paper are to provide (1) a detailed description of the available methods, (2) a comparison between these methods, and (3) a decision-making procedure for selecting between these methods. The description and comparison of the methods relies on different criteria: marginals availability, sample size, number of potential attributes that can be handled, population size to generate, possibility of zero-cell problem, and so forth. The advantages and shortcomings of each method are illustrated, and method performance is assessed. The decision-making procedure is carried out through the proposal of a decision tree. Researchers and practitioners have now access to a comprehensive and unified framework to select the appropriate method depending on available data and features of their modeling purposes.
This article describes the generation of a detailed two-layered synthetic population of households and individuals for French municipalities. Using French census data, four synthetic reconstruction methods associated with two probabilistic integerization methods are applied. The paper o ers an in-depth description of each method through a common framework. A comparison of these methods is then carried out on the basis of various criteria. Results showed that the tested algorithms produce realistic synthetic populations with the most e icient synthetic reconstruction methods assessed being the Hierarchical Iterative Proportional Fitting and the relative entropy minimization algorithms. Combined with the Truncation Replication Sampling allocation method for performing integerization, these algorithms generate household-level and individual-level data whose values lie closest to those of the actual population.
Agent-Based Models (ABMs) are being increasingly used to evaluate urban systems, urban policies and environmental impacts. One prerequisite for using the ABM framework consists of generating a synthetic population representative of the actual population, featuring the appropriate attributes with respect to model objectives. A precise spatial positioning of the synthetic population agents is often key to ensuring ABM modeling quality. This paper considers the problem of allocating synthetic population agents to a finer spatial scale. Such an allocation process is performed from a higher-level statistical area where a synthetic population can be generated, that is, a container statistical area (CSA), to several nested non-overlapping elementary statistical areas (ESAs), where only marginals are available. This allocation step relies not only on common attributes between CSA and ESA, but also on additional discriminatory attributes, that is, attributes of interest, estimated from external data sources. The case study examined herein is based on French census and fiscal data. Common attributes include eight socio-demographic variables, totaling 17 modalities. An additional attribute of interest, that is, income, has also been added. The allocation problem at hand is modeled as an integer quadratic programming problem. An exact algorithm is first applied to solve the problem; the applicability of this algorithm proves to be limited to small-size synthetic populations. A heuristic is proposed to handle the allocation of larger-size synthetic populations. Tests carried out on the case study show that this heuristic yields near-optimal solutions; it is also computationally efficient and may fulfill the needs of a majority of users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.