Simple Approaches for Random Utility Modelling
A Daly, ITS University of Leeds/RAND Europe, UK; S Hess, ITS University of Leeds, UK
Analysis methods for panel data are presented, compared and discussed, focussing particularly on those that avoid the full complexity of taste heterogeneity while allowing for the correlations between the responses of each individual.
In the last few years much work in transport and other fields has been based on the use of data in which surveyed individuals give multiple responses. Such ?panel data? most commonly comes from stated choice experiments, though other cases of panel data do occur. Stated choice data has well-known advantages of permitting experimental design and of allowing consideration of alternatives beyond the current marketplace and is usually much cheaper than equivalent revealed preference data. Stated choice data is limited in application because it is hypothetical, and the choices in a real life scenario may well be different. Here, care at the survey design and implementation stages may be helpful. The present paper is concerned with a different issue, however, namely the fact that in stated choice data as well as in any other panel data, we must face up to the problem that responses from a given individual cannot be treated as independent.
For many years the approach used to deal with panel data was to ignore the problem of correlation. This ?naïve? approach yields parameter estimates whose properties have not been fully understood and these are explained in the paper. While the parameters themselves may be quite reasonable estimates of true values, the degree of confidence to be attached to them cannot be obtained from naïve analysis.
The second approach to panel data was to devise methods to correct the parameters and in particular the confidence measures obtained from the naïve approach. This yielded a series of methods, such as a simple square-root formula or the use of resampling methods like the jack-knife or boot-strap. A problem with these approaches is that different results can be obtained depending on the assumptions made during resampling. Other approaches have also been used.
Recently, there has been an emphasis on random coefficients models allowing for a representation of taste heterogeneity, notably as a result of the Revelt & Train (1997) work which provided for inter-respondent variation while maintaining intra-respondent homogeneity. This was extended by Hess & Rose (2009) to additionally allow for intra-respondent heterogeneity. However, while these approaches allow the analyst to account for the panel nature of the data, they also open a can of worms: the description of the full variation of parameters within and between individuals, including potential correlations, requires the estimation of very many parameters, which will often be beyond the budget of a study, the capabilities of available software and/or the information content of a data set. But estimating only a subset of these variations is arbitrary and can easily lead to biases. Additionally, analysts may in fact only be interested in producing point estimates of average valuations, and so in fact allow for the heterogeneity only with a view to accounting for the panel nature of the data. Obtaining unbiased point estimates of WTP measures from models allowing for taste heterogeneity is by no means straightforward.
Some analysts have attempted to avoid the pitfalls of introducing a representation of random heterogeneity by instead turning to error components with a view to capturing correlation across choices for the same respondents. These methods are attractive, but require care in specifying the way in which additional variation enters the model. However, a large number of published applications making use of these approaches ignore the fact that their models now introduce heteroskedasticity or correlation across alternatives, while other papers rely on specifications beset by identification issues.
The various approaches are compared and contrasted in the paper, with the help of illustrative results obtained from real and simulated data. A number of important pitfalls and mistakes are identified, and two simple approaches are put forward as worthy of more extensive exploitation.
Association for European Transport