Validation and Forecasts in Models Estimated from Multi-days Travel Survey.
E Cherchi. Universita di Cagliari, IT; C Cirillo, University of Maryland, US
This paper studies the issues related to model validation and forecasting on repeated observations data sets (panel data).
A number of multi-day travel surveys (also generally called panel data) have been collected in the last decade, and consequently their use for travel behaviour modelling and analysis has significantly increased. There are basically two types of multi-day data depending on whether the same survey is repeated at ?separate? times (e.g. once or twice a year for a certain number of years), or over a ?continuous? period of time (e.g. seven or more successive days). The first type of panel has been used to study how behaviours change as the environment varies (i.e. the supply or the socio-economic characteristics) and to study dynamic/shock effects. The second type has been collected to gain insights into activity scheduling and travel planning, and to analyse the variability in daily/weekly travel. Recently, advanced econometric models have been applied to panels in order to be able to capture heterogeneity and different patterns of correlation; however, analyses have been often limited to estimation issues.
This paper studies the problems related to model validation and forecasting on repeated observations data sets, where a small number of individuals continuously provide multiple responses over a certain number of days or weeks. The problem is quite relevant because these datasets are characterized by a small number of respondents and repeated observations over a period which is usually one week long, but can be as long as six weeks. The topic treated is even more important if we consider that researchers in activity-based modelling are trying to extend the usual one-day framework to longer periods of time and that dynamic effects are now included into the model formulation to account for past history, habit and state dependency effects.
The analyses that will be provided are based on both simulated and real data and give empirical evidence on the effects that different pattern of correlation have on model forecast and policy analysis. Model validation is executed on two different types of hold-out sample, one based on a percentage of individuals with all their observations, the other obtained by excluding part of the responses from the entire set of individuals.
Simulated data will be used to assess if model formulations accounting for heterogeneity and correlation effects (which correspond to the true model structure) and estimated on a subset of individuals or on a subset of observations, are able to provide better forecasts. We intend also to study how individual choices change when level of service variables vary according to transportation policies. In particular, we want to test the ability of the model to recover the real modal shifts when ignoring heterogeneity and correlation effects.
In addition, we will explore if the findings from simulated data are confirmed when estimating a mode choice model on real data. In this paper, real observations are extracted from a six-week travel diary collected in Karlsruhe (Germany) and part of the Mobidrive survey. The objective is to assess if models accounting for taste heterogeneity in level of service variables, correlation over daily tours and correlation over individual tours provide better forecasts with respect to simple multinomial logit, and if the ability to reproduce real choices increases with model complexity.
Association for European Transport