## The Estimation of Aggregate Modal Split Models

### Authors

P Bonnel, Laboratoire d'Economie des Transports - ENTPE, FR

### Description

### Abstract

In spite of the fact that disaggregate modelling has undergone considerable development in the last twenty years, many studies are still based on aggregate modelling. In France, for example, aggregate models are still in much more common use than disaggregate models, even for modal split. The estimation of aggregate models is still therefore an important issue.

In France, for most studies it is possible to use behavioural data from household surveys, which are conducted every ten years in most French conurbations. These household surveys provide data on the socioeconomic characteristics both of individuals and the households to which they belong and data on modal choice for all the trips made the day before the survey. The sampling rate is generally of 1% of the population, which gives about 50,000 trips for a conurbation of 1 million inhabitants. However, matrices that contain several hundred rows and columns are frequently used. We therefore have to construct several modal matrices that contain more than 10,000 cells (in the case of a small matrix with only 100 rows) with less than 50,000 trips (to take the above example). Obviously, the matrices will contain a large number of empty cells and the precision of almost all the cells will be very low. It is consequently not possible to estimate the model at this level of zoning.

The solution which is generally chosen is to aggregate zones. This must comply with two contradictory objectives:

* the number of zones must be as small as possible in order to increase the number of surveyed trips that can be used during estimation and hence the accuracy of the O-D matrices for trips conducted on each mode;

* the zones must be as small as possible in order to produce accurate data for the explanatory variables such as the generalized cost for each of the transport modes considered. When the size of the zone increases, it is more difficult to evaluate the access and regress time for public transport and there are several alternative routes with different travel times between each origin zone and each destination. Therefore more uncertainty is associated with the generalized cost that represents the quality of service available between the two zones. The generally adopted solution is to produce a weighted average of all the generalized costs computed from the most disaggregated matrix. However, there is no guarantee that this weighted mean will be accurate for the origin-destination pair in question.

When the best compromise has been made, some of the matrix cells are generally empty or suffer from an insufficient level of precision. To deal with this problem we generally keep only the cells for which the data is sufficiently precise by selecting those cells in which the number of surveyed trips exceeds a certain threshold. However, this process involves rejecting part of the data which cannot be used for estimation purposes. When a fairly large number of zones is used, the origin-destination pairs which are selected for the estimation of the model mainly involve trips that are performed in the centre of the conurbation or radial trips between the centre and the suburbs. These origin-destination pairs are also those for which public transport?s share is generally the highest. The result is to reduce the variance of the data and therefore the quality of the estimation.

To cope with this problem we propose a different aggregation process which makes it possible to retain all the trips and use a more disaggregate zoning system. The principle of the method is very simple. We shall apply the method to the model most commonly used for modal split, which is the logit model. When there are only two modes of transport, the share of each mode is obtained directly from the difference in the utility between the two modes with the logit function. We can therefore aggregate the origin-destination pairs for which the difference between the utility of the two modes is very small in order to obtain enough surveyed trips to ensure sufficient data accuracy. This process is justified by the fact that generally the data used to calculate the utility of each mode is as accurate or even more accurate at a more disaggregate level of zoning. The problem with this method is that the utility function coefficients have to be estimated at the same time as the logit model. An iterative process is therefore necessary. The steps of the method are summarised below:

* selection of initialization values for the utility function coefficients for the two transport modes in order to intitialize the iteration process. These values can, for example, be obtained from a previous study or calibration performed according to the classical method described in Section 1.2;

* the utility of each mode is computed on the basis of the above coefficients, followed by the difference in the utility for each O-D pair in the smallest-scale zoning system for which explanatory variables with an adequate level of accuracy are available (therefore with very limited zonal aggregation or even none at all);

* the O-D pairs are classified on the basis of increasing utility difference;

* the O-D pairs are then aggregated. This is done on the basis of closeness of utility difference. The method involves taking the O-D link with the smallest utility difference then combining it with the next O-D pair (in order of increasing utility difference). This process is continued until the number of surveyed trips in the grouping is greater than a threshold value that is decided on the basis of the level of accuracy that is required for trip flow estimation. When this threshold is reached the construction of the second grouping is commenced, and so on and so forth until each O-D pair has been assigned to a group;

* for each new class of O-D pairs it is necessary to compute the values of the explanatory variables which make up the utility functions for each class. This value is obtained on the basis of the weighted average of the values for each O-D pair in the class;

* a new estimation of the utility function coefficients.

This process is repeated until the values of the utility function coefficients converge. We have tested this method for the Lyon conurbation with data from the most recent household travel survey conducted in 1995/96. We have conducted a variety of tests in order to identify the best application of the method and to test the stability of the results. It would seem that this method always produces better results than the more traditional method that involves zoning aggregation. The paper presents both the methodology and the results obtained from different aggregation methods. In particular, we analyse how the choice of zoning system affects the results of the estimation.

#### Publisher

Association for European Transport