Estimating Independent and Simultaneous Trip Frequency Models for All Travel Purposes with Combined Logit/Poisson

Estimating Independent and Simultaneous Trip Frequency Models for All Travel Purposes with Combined Logit/Poisson


O I Larsen, Molde University College and Molde Research, NO



This paper presents an attempt to use behavioural assumptions from discrete choice models as a basis for specification of a Poisson model for count data. It consists of a theoretical part dealing with the basic assumptions A key assumption is that the expected number of Poisson events is a function of the expected surplus of the 'best' potential event, subject to the condition that the surplus is non-negative. 'Surplus' is defined in the spirit of discrete choice models as consisting of a deterministic component and a stochastic component that is assumed to be Gumbel distributed. Expressions for this conditional surplus are derived on two assumptions. In both cases it is assumed that that the surplus of an event consists of two terms, 'utility of a visit' less 'opportunity cost'. The first assumption is that both includes a Gumbel- distributed stochastic term and the 'surplus' will have a Logistic distribution. The probability that the 'surplus' of the best event is non-negative can then be formulated as a binary logit model. If only one of the terms is stochastic with an assumed Gumbel-distribution the probability is derived from the Gumbel-distribution. At this stage it is also possible to assume a normal distribution for the stochastic terms which will result in a binary probit model, but this is not pursued in the paper.

In the second stage a truncated Poisson model is used for the probability distribution if one or more events are observed. The mean of the truncated Poisson is then defined as function of the conditional expected surplus of the best event. The two stages are estimated simultaneously with maximum likelihood. The combined distribution is more flexible than a standard Poisson distribution and will allow for both over- and underdispersion relative to the mean both on the individual and on the aggregate level. The approach suggested is general in the sense that it might applicable to different types of count data when we can assume that the observations are generated by maximising agents. It might be an alternative to other distributions for count data , e.g. negative binominal distribution, when a standard Poisson model is to restrictive.1) The work reported here is part of the initial testing for model specification and estimation for a new nationwide model for short trips (less than 100 kilometres one way) which also will include mode and destination choice. The approach suggested above was used to estimate 'trip frequency' models on a sub-sample from a national travel survey. Contrary to what is usually the case in travel demand modelling, the unit of counts is neither trips nor tours, but the number of places visited (excluding the respondent's own home) during a day. The translation of visits into tours with one or more destinations then becomes a necessary part of a complete model system, but this issue is dealt with in a separate paper.

Models for 5 travels purposes are estimated. For each model specifications with 'straight' Poisson, Logit/Poisson and Gumbel/Poisson are compared both with respect to statistical fit and predictive performance. For 3 out of 5 purposes the conclusion is that both the Gumbel/Poisson and the Logit/Poisson can be accepted as the model that have generated the observations. Overall both the Gumbel/Poisson and the Logit/Poisson perform considerably better than a standard Poisson model, while the differences between the former two are marginal.

Both รก priori considerations and the results from independent estimation suggest that some correlation exists for the error terms between the number of visits with different purposes. This shows up when the correlation matrix for standardised residuals is calculated for the 5 purposes and indicates that independent estimation of models for different purposes might introduce bias in the separate models.

To overcome the correlation problem, two methods, that do not exhaust the possibilities, are suggested. One is to estimate conditional models along the line suggested above, i.e. for each purpose to estimate a model conditioned on the number of visits made with other purposes. While this is easy in terms of estimation it might pose some problems to implement the models afterwards. The effects of conditional estimation are reported and as expected the models perform somewhat better.

The second approach is to use Logit/Poisson for the total number of visits and a multinominal model for the distribution between purposes. The probabilities in the multinominal model are then formulated as logit probabilities and the logsum from the multinominal model enter as an explanatory variable in the Logit/Poisson for total number of visits. For 'straight' Poisson models it is a well known fact the simultaneous distribution of independent Poisson variables is equivalent with a Poisson distribution for the sum with mean equal to the sum of the means multiplied by a multinominal distribution conditioned on the sum of the variables.

The extension to Logit/Poisson with modifications is somewhat ad hoc, but can also be combined with 'nesting' of purposes and is thus - in a sense - more in the spirit of traditional discreet choice models. The option to include 'nesting' of purposes can in some cases probably be an advantage. A multinominal model that distribute visits between purposes automatically introduce a negative correlation between the number of visits with different purposes.

A simultaneous model of this type is formulated and estimated with maximum likelihood and seems to perform well. Such a model is comparatively easy to implement in a comprehensive system of models. The expected number of visits for different purposes can be calculated by multiplying the expected total number of visits from the Logit/Poisson model with the probabilities from the multinominal model. It turns out that the estimated model performs quite well in terms of reproducing the original data both in aggregate and disaggregated on different segments.

1) e.g. Cameron A. Colin and Pravin K. Trivedi (1986) and Hausman, Hall and Griliches (1984).


Association for European Transport