Accounting for Stochastic Variables in Discrete Choice Models

Accounting for Stochastic Variables in Discrete Choice Models


F Diaz, V Cantillo, Universidad del Norte, CO


This paper aims to identify better specifications that account for stochastic variables in discrete choice models.


The estimation of discrete choice models requires measuring of explanatory variables such as socio-economical characteristics of the individual and attributes of the alternatives in the choice set. Usually these variables are assumed to be fixed in repeating sampling. However, some variables are intrinsically stochastic (e.g. travel time). In consequence, even an accurate measure can be biased, i.e. different from the original value perceived by the individual.

Furthermore, even if variables are intrinsically non-stochastic they still can be measured inaccurately; for instance, it is common practice to assign a single "average" value of variables such as travel time and cost to a temporal and spatial aggregation of diverse trips, e.g. the trips from one zone to another in the rush hour, therefore inducing bias. Measurement errors also occur when variables like income or preferred departure time are provided by the individual in a revealed preferences survey and there is a cognitive or political bias involved.

Experimental evidence suggests that discrepancy between the values perceived by the modeller and the individual can lead into incorrect parameter estimates. On the other hand, there is a trade-off between quality of data and costs of collection. Hence, this paper aims to indentify better specifications that account for stochastic variables in discrete choice models.

First of all we carry out an econometric analysis. We show how under certain conditions the problem can be addressed by specifying particular versions of the Mixed Logit (ML) such as the Error Component Logit (ECL), since the presence of stochastic variables induce heterocedasticity between alternatives. We also show there might be confusion with an apparent taste's heterogeneity.

Based on the previous analysis we test and compare the performance of the Multinomial Logit Model (MNL) against more complex versions of the ML at dealing with stochastic variables. The statistical comparison is made in terms of unbiased parameters estimation and correct calculation of marginal rates of substitution and demand forecasting under the implementation of several transport policies. For the experimental analysis we used synthetic data sets with known parameters in the context of a modal choice, controlling the sample size and the level of stochasticity.

The results show that some versions of the ML such as the ECL can recover the true parameters of the model unlike the MNL. However, the popular MNL is fairly robust regarding marginal rates of substitution and demand forecasting when estimated with higher sample sizes (>2000 observations), at least with the data sets generated in this investigation. In any case, the ECL shows the best performance at accounting for stochastic variables, except for the models estimated with a small sample size (<500). The findings of this paper can help to estimate robust discrete choice models applicable to planning purposes when there is a significant stochasticity in the explanatory variables


Association for European Transport