The Impact of Varying the Number of Repeated Choice Observations on the Mixed Multinomial Logit Model
J M Rose, ITLS Sydney, AU; S Hess, ITS, University of Leeds, UK; M C J Bliemer, TU Delft, NL; A Daly, RAND Europe, UK
This paper discusses sample size requirements for Mixed Logit estimation
The mixed multinomial logit (MMNL) model is rapidly becoming one of the most widely used models within the choice modelling literature. This is because the MMNL model offers a degree of flexibility over other discrete choice models including, depending on how the model is specified, allowing for i) the incorporation of preference heterogeneity, ii) an accommodation of within respondent correlation across repeated choice observations, and iii) non-constant error variances across alternatives via a relaxation of the IID assumption.
Whilst the increasing use of the MMNL model to model discrete choice data has benefited those researching in the area of discrete choice, estimation of the model comes with not only considerable cost, but also risk. For example, issues related to empirical identification of the model have been widely discussed in the literature even though such issues appear to be largely ignored by many published papers. Other examples of issues resulting directly from use of this model include possible confounding of income effects with random taste heterogeneity and issues related to scale effects resulting in potentially misleading random parameter estimates. Finally, Fosgerau amongst others has recently argued that where only a few observations per respondent are captured in choice data, whether the data is SC or RP data, it becomes increasingly difficult to distinguish between different error sources within the data, as well as to disentangle these error sources from parameterised preference heterogeneity.
While the existing theories relating to discrete choice data easily fit within the MMNL model framework, it is not certain that the properties of the model itself have been fully understood in terms of the number of observations required per individual to properly fit the model parameters. At issue is what precisely do random parameter estimates represent when only a small number of observations are captured per respondent.
Methodologically, the MMNL model can be used to represent the situation in which only one observation is captured per respondent (cross-sectional MMNL), the case where a single respondent fills the whole data set (collapses to MNL), and any case in between (panel MMNL). In this paper, a number of hypotheses on the impacts of the number of choice observations per respondent in the estimation of the MMNL model are formulated. These hypothesis are then tested using Monte Carlo simulations to explore this issue. In particular, we explore the impact that capturing different numbers of observations per respondent has on the sample size requirements of MMNL models. We explore this issue not just in the context of stated choice type data, but also for revealed preference data where we vary not only the number of observations captured in such data, but also the range of variability in the data to explore the impact that such variability has on the sample sizes required for such data. We demonstrate that where only a few observations are captured per respondent, greater sample sizes are required in order to establish the statistical significance of the parameter estimates derived from the choice data. By understanding such issues, we provide recommendations on how best to sample specifically when one anticipates using the MMNL model. This issue remains an important topic to examine given the dominance of this model in discrete choice modelling today.
Association for European Transport