Using Sampling Techniques to Reduce the Run Times for Revealed Preference Discrete Choice Logit Model Estimation

Using Sampling Techniques to Reduce the Run Times for Revealed Preference Discrete Choice Logit Model Estimation


Peter Davidson, Peter Davidson Consultancy, Rob Culley, Peter Davidson Consultancy, Oliver Capon, Peter Davidson Consultancy


Revealed preference logit model estimations with large number of choice observations and alternatives can result in long run times. We describe the impact and the issues resulting from sampling the choice observations.


For revealed preference logit model estimation, both the number of choice observations and the number of alternatives can be very large, which can make run times so long as to make the estimation infeasible – for example a few months per run. Researchers and practitioners have adopted various strategies for reducing the estimation run-times including sampling from the alternatives, for example by limiting the number of destination zones individually modelled by random sampling. We propose and analyse a different approach where the choice observations themselves are sampled. This approach was applied recently to our recalibration of the Transport Model for Scotland. In this case the number of records was very large and this paper describes how applying sampling techniques reduced the estimation run times while still giving robust results.
The demand model to be calibrated was a nested mode and destination choice model with over 700 zones. Conventionally as much data as possible should be used for estimating coefficients for a logit choice model, but run time constraints can become significant, especially if more sophisticated models other than multinomial logit are used. We varied the samples and sample sizes from the data records and fitted each sample to a nested logit choice structure independently. The resulting sets of coefficients were combined. The statistics of the combined run results are described, and a comparison is made of the overall results using different methods of sampling with a full run where all records were included. Problems encountered, such as convergence, counter-intuitive coefficient signs, low ‘t’ statistics and wildly different coefficient estimates are described together with an approach suggested for dealing with these. We are able to quantify the effect of alternative strategies. Different techniques for sampling, including with replacement and with non-replacement methodology are also described.
We provide overall guidelines based on this project experience and show where significant time savings can be achieved. Some limitations of the approach are discussed.


Association for European Transport