Discrete Mixtures of Random Utility Models
S Hess, Imperial College London and RAND Europe, UK; J Polak, Imperial College London, UK; M Bierlaire, EPFL, CH
This paper discusses an alternative approach to the use of continuous statistical distributions for accommodating non-deterministic taste heterogeneity in discrete choice models, making use of discrete mass-point distributions.
The choice probabilities in the random-coefficients (RCL) formulation of the Mixed Multinomial Logit (MMNL) model are calculated as the integral of Multinomial Logit (MNL) choice probabilities over the assumed distribution of the random taste coefficients. This approach allows for a random distribution of tastes across decision-makers, giving the RCL model a crucial advantage over the MNL model (and other closed-form models by extension) in the case where it is impossible to explain sufficient amounts of taste heterogeneity in a deterministic fashion. The same approach can also be used with underlying GEV structures as the integrand, leading to Mixed GEV models.
Two main technical complications arise with the use of the RCL model. The first issue is that, except for the case of trivial distribution functions for the random taste coefficients, the integral representing the choice probabilities does not have a closed-form solution, and numerical techniques, typically simulation, need to be used in the estimation and application of the model. Depending on model complexity, the computational cost of these processes can be prohibitively high, despite continuing improvements in computer speed and efficiency of simulation techniques. The other issue relates to the choice of distribution. Even though the use of a flexible distribution whose shape is consistent with the intuitive understanding of the true distribution can minimise the bias between the true and postulated distributions, some bias will inevitably remain; cases will arise in which real-world behaviour cannot be characterised adequately by one of a set of standard statistical distributions. One case in point arises in the modelling of tastes which may theoretically have a significant mass at zero but be exclusively positively or negatively signed elsewhere. This issue can be addressed through the use of empirical distributions, retrieving the shape of the actual distribution of tastes in the sample population; however, with this approach, the issue of the cost of simulation however remains in the case of continuous distributions.
In this paper, we conduct an analysis of the potential of an alternative approach, based on the use of discrete distributions that distribute the mass among several discrete values. This is similar to the case of a latent-class model, assigning different coefficient values to different parts of the population of respondents. In theory, existing discrete distributions (e.g. Poisson) could be used; however, this comes at the cost of flexibility and again leads to the problem of reconciling the assumptions about the shape of the theoretical distribution with the actual shape of the true distribution. This problem does not exist in the case where a fixed set of coefficient values are used that each have an associated probability, but where the values and associated probabilities are free from any a priori constraints.
Our approach can be seen either as a generalisation of a model using fixed-point estimates, or as a simplified version of a model allowing for a continuous distribution of tastes across respondents. As a generalisation of the fixed-point estimates model, our approach offers more modelling flexibility. A typical example is the analysis of value of travel-time savings (VTTS). In this case, the time constraint of some individuals is not binding, and their VTTS is zero, while the remaining individuals have a positive value-of-time. Clearly, a point estimate is not appropriate in this context, while standard continuous distributions are not adequate (c.f. Hess et al, 2004). Compared to a model with a continuous distribution, the use of a fixed number of support-points clearly leads to a reduction in flexibility. On the other side, the model is free from any assumptions resulting from the choice of a specific statistical distribution in the continuous case. Furthermore, the disadvantages caused by relying on a fixed number of support-points should be expected to decrease rapidly with increases in the number of points used.
Another major advantage of our approach is the lack of need for simulation processes. In our proposed approach, the choice probability is given by the weighted sum over the choice probabilities with a number of different values for the taste coefficients, where the weight of a given coefficient value reflects the share in the population of respondents for which this coefficient is the best approximation to their actual true coefficient value. While summation over a set of support points is a similar process to simulation, which involves summation over a set of draws from the assumed distribution, the Empirical approach has the advantage that a higher number of points invariably leads to better performance, while, with simulation, a higher number of draws generally leads to a reduction in simulation bias, but not necessarily model performance, especially if the distributional assumptions were sub-optimal. Furthermore, a lower number of draws will almost inevitably be needed in the discrete approach (it is unlikely that as many as several hundred different values are required for good performance - such high numbers are however commonly used in RCL models). This does not however directly imply that the discrete model has a runtime advantage. Indeed, although the individual optimisation steps may be cheaper due to the use of a lower number of draws, a higher number of iterations will almost inevitably be required, due to the much higher number of parameters to be estimated (K different values for each taste coefficient, with K-1 associated population shares) when compared to the continuous RCL model (generally two parameters for each randomly distributed coefficient). Finally, it should be noted that the model can be adapted to allow for inter-alternative correlation, either with the help of continuously distributed error-components, or on the basis of an underlying GEV structure.
To reduce identification issues, the model was coded in Biogeme in a way where a priori bounds can be imposed for the different mass-points; this avoids estimates that produce mass-points closely grouped around the mode of the true distribution, and also considerably speeds up estimation. The code also allows us to constrain the mass-points to specific values, allowing researchers to test specific assumptions, such as a mass at zero in VTTS estimation, as mentioned above.
A combination of simulated and real datasets (including data from mode-choice and VOT studies) are used in the analysis, to compare the performance of the continuous and mass-point distributions under different circumstances. Our analysis shows that, with a reasonable number of support points, estimation speed stays at acceptable levels. Our initial estimation results further show that a relatively low number of support points (<15) is generally sufficient to obtain good model performance, which is often superior to that of models using various continuous distributions.
Hess, S., Bierlaire, M. & Polak, J. W. (2004), Estimation of value of travel-time savings using Mixed Logit models, Transportation Research A, forthcoming.
Association for European Transport