On the Asymptotic Properties of Complex Discrete Choice Model Estimators
F Bastin, Universite de Montreal, CA; C Cirillo, University of Maryland, US
In this paper, we briefly review the relevant theoretical background on asymptotic analysis and provide numerical tests to assess the relevance of the problem.
To estimate unknown parameters of some utility functions, demand modellers usually draw finite observations samples, from a population assumed to be infinite. Several methods are then used to find estimators with known statistical properties: maximum likelihood, least squares or squared moment conditions. More recently, simulated likelihood functions have been introduced to estimate more complex discrete choice models, with no closed mathematical formulation for the choice probability.
For simple models, under some regularity conditions, the estimates asymptotically converge to a normal distribution centered on zero and with a variance-covariance matrix that can be derived from a simple Taylor expansion. This holds for correctly specified models, and also misspecified, but consistent ones. Nonconsistency usually makes the asymptotic normality doubtful. In practice, researchers often rely on the information identity, which allows estimating the variance-covariance on the basis of the Hessian alone. This identity however requires correct model specification.
Unfortunately, the more advanced models proposed during the last decade can make such statistical analysis unreliable, as the regularity conditions do not necessarily hold. In that case, alternative techniques, as bootstrap, have to be used to compute confidence intervals over the estimates.
To illustrate these technical difficulties, we consider mixed logit models, especially those specified as random coefficients. The coefficients are assumed to vary over the population with an underlying density function; this density can be parametric or nonparametric. The number of parameters to be estimated for each attribute depends on the distributional form; normal and log-normal distributions require the estimation of mean and standard deviation; truncated normal also requires the estimation of the mass(es); Johnson Sb require the estimation of the part-worths; nonparametric models require a number of parameters depending on the supporting points chosen by the analysts. These additional complexities might lead to confusing quality measurements, and to misinterpretation of the estimates obtained by applying classical statistical techniques. We first illustrate that several parameters affected to the same factor create correlations between the estimates. A correct calculation of the variance-covariance matrix is also relevant for the construction of optimal experimental design in stated choice data collection. The majority of the methods presented in the literature, propose the minimization of the standard errors obtained from data collected and hence the maximization of the asymptotic t-statistics. This however ignores correlations effects, which can diminish the explanation power of a model. Finally, some models impose constraints on the parameters to estimate, voiding some basic assumptions behind asymptotic normality. The computation of usual t-statistics is then no longer possible, and we have to turn to empirical distribution techniques to study the properties of the estimators.
In this paper, we briefly review the relevant theoretical background on asymptotic analysis and provide numerical tests to assess the relevance of the problem. For each of issue identified, we discuss how to correctly analyse the estimated parameters and the implications on practical model formulation.
Association for European Transport