Occam's Razor and Some Randomness: Generating a Synthetic Population for Switzerland
K Mueller, K W Axhausen, IVT, ETH Zurich, CH
This paper compares three multi-level fitting procedures (simultaneously controlling for, e.g., persons and households), and presents a novel approach to introducing heterogeneity. Validation is performed by synthesizing populations for Switzerland.
Agent-based microsimulation models simulate the behavior of agents over time. For land use models (e.g., UrbanSim), time is measured in years, and the simulations aims at predicting a future system state. In contrast, transportation models (e.g., MATSim-T) estimate the utilization of the network during one day. For both kinds of models, the initial step is the definition of agents and their relationships. Synthesizing the population of agents often is the only solution, due to privacy and cost constraints. In this paper, we assume that the model simulates persons grouped into households, and a person/household population needs to be synthesized. However, the methodology presented here can be applied to other kinds of agent relationships as well, e.g. persons and jobs/workplaces or persons and activity chains.
Generating a synthetic population requires (a) reweighting of an initial population, taken from census or other survey data, with respect to current constraints, and (b) choosing the households that belong to the generated population. The reweighting task can be performed using an Iterative Proportional Fitting (IPF) procedure, which obeys the Principle of Minimum Discrimination Information. However, IPF cannot control for attributes at both person and household levels. A frequently applied pattern is to estimate household-level weights using IPF, so that they match the control totals for the households, and then, using the household-level weights, generate a population of households that best fits the person-level control totals.
Recently, an alternative fitting routine named Iterative Proportional Updating (IPU) has been proposed. IPU is capable of estimating household-level weights that fit the control totals at both person and household levels. After such a multi-level fitting procedure, the generation of synthetic households relies only on the estimated weights and does not need to control at person level anymore. Another approach to multi-level fitting presented in recent literature is to estimate the weights directly with the objective to minimize relative entropy, in accordance to the Principle of Minimum Discrimination Information. We use both procedures, and a novel algorithm for multi-level fitting, to generate synthetic populations of Switzerland. The three approaches are compared with respect to convergence speed, ease of implementation, and goodness-of-fit -- for the latter, we check the generated population against the complete Swiss census. This will help choosing a good strategy for multi-level fitting.
A common feature of many recent synthesis procedures is the replication of persons and households: The generated population does not contain any persons or households not present in the reference sample. This is a problem if the number of attributes is large and the reference sample is considerably smaller than the target population: In this case, many identical individuals will be generated. We evaluate a novel approach to introduce heterogeneity in the synthetic population while preserving its statistical properties. For our Switzerland case study, we compare two synthetic populations, with and without heterogeneity, to each other. This evaluation will serve as a proof of concept for our procedure.
Association for European Transport