A Practical Test for the Choice of Mixing Distribution in Discrete Choice Models
Mogens Fosgerau, Danish Transport Research Institute, DK; Michel Bierlaire, Ecole Polytechnique Federale de Lausanne, CH
We propose a practical test for the choice of mixing distribution in a discrete choice model, based on seminonparametric techniques. The test is analyzed both on synthetic and real data, and is shown to be simple and powerful.
The choice of a specific distribution for random parameters of discrete choice models is a critical issue in transportation analysis. Indeed, various pieces of research have demonstrated that an inappropriate choice of the distribution may lead to serious biases in model forecast and in the estimated means of random parameters. A number of papers have sought to find flexible distributions with desirable properties. However, the quest for suitable distributions has been hampered by inability to perform statistical tests of distributions against general alternatives.
In this paper, we propose a practical test, based on seminonparametric (SNP) techniques. The main idea is that any (unknown) distribution G can be expressed in terms of a known distribution F and an (unknown) distribution Q on the unit interval by G(w)=Q(F(w)). The density of Q may be expressed as a squared series of Legendre polynomials, which is a SNP approximation that can approximate virtually any distribution. The number of SNP terms in the series may be increased arbitrarily in order to achieve the desired level of flexibility. The known distribution F can now be tested against the general alternative by testing the restriction that Q is the uniform distribution. This amounts to testing whether the SNP parameters for the series are zero and this is thus merely a standard parameter restriction test.
The test is applied both to synthetic and real data, and is shown to be simple and powerful. In the simulation study we generated 100 synthetic datasets for each of two by two cases: either a true normal or lognormal distribution and either an assumed normal or lognormal distribution. Thus, for example, we generated data using a normal distribution and tested whether a lognormal distribution could be accepted. The results show that the power of the test is high meaning that the share of false hypotheses rejected is high and close to 100%. The actual size of the test varies and may be higher than the nominal size meaning that sometimes more than 5% of true hypotheses were rejected when testing at 5%. Including 2 or 3 SNP terms should be sufficient for most purposes.
The application to real data employed a dataset from the Danish value of time study. In preference space we rejected both the normal and the lognormal distributions. We can accept the normal distribution in log willingness-to-pay space and conclude that the application of the test to real data was succesful.
The test has been implemented in BioGeme and is thus generally available.
Association for European Transport