Data Collection Biases in a Transport Survey

Data Collection Biases in a Transport Survey


L Christensen, Danish Transport Research Institute, DK


For DK National Travel Survey, paper investigates to what extent performance of the interviewers and changes in the calling time and routine can explain an unexpected increase in the probability of having no trips and a decreasing number of trips.


Transport Models are no better than the data they are based on. Especially the activity-based models are dependent on the quality of the transport surveys like the National Transport Surveys. Data quality problems are discussed in this paper with outset in the ongoing Danish Transport Survey.

The Danish National Transport Survey has been running since 1993 with about 14.000 telephone interviews per year, but the observed average distance travelled (ADT) is not developing as it should be expected from other data sources, for instance car traffic counts.

A more thorough investigation showed that the most important change was an increase in the share staying at home on the interview day from around 15% of the interviewed in 1997 to around 25% in 2001. The trip rate for persons with trips decreased in the same period. None of thiese changes is expected. The ADT on the other hand did not decrease for persons with trips. From 2002 the trip rate and zero trip rate changed again to the 1998 level.

The purpose of the paper is to find explainations to the changes over the years by the performance of the interviewers, and by changes in the data collection methodology. Problems with soft refusal are discussed and changes related to introduction of cell phones as interview media is analysed.

The purpose of the paper is to make recommendations on how to avoid the problems related to data collection biases.

The main reason for the observed changes in the zero trip rate and the trip rate seems to be differences in interviewer performance. The results emphasise the importance of intensive monitoring and quality control of interviewer results. Interviewer fatigue seems to be important to take into account when carrying out such surveys over a long period of time.

Changes in data collection methodology contribute less to explaining the observed changes, since there has been a relative stable distribution of interviews over the day and week. However, the results indicate the importance of maintaining a stable distribution of interviews over the day and week, since interview results are highly dependent on both. Changes in this distribution may invalidate comparisons over time.

Introduction of mobile telephones in the calling has changed both contact pattern and the result of the interviews.

Two multiple regression models are constructed:

· A logistic regression model on the probability of staying at home
· A poisson regression model for the trip rate for persons with trips.

The data collection methodology is included in the models by two types of variables:

· Variables related to the calling time, for instance the type of day which the interview concerns, type of day and hour of interview, and monthly dummies
· Variables related to the calling routine, for instance the number of calls to the respondent before an interview is conducted, the number of calls per day, and the time passed since the last calling day. Mobile telephones are treated separately.

Variables related to the calling routine are of special interest because the calling routine changed from 1997 to 1998 when the survey was spread out from one week per month to the whole month.

A dummy for each of 10 of the 70 interviewers is included to describe the difference in performance of the interviewers. These 10 interviewers have a significant higher share of people staying at home and lower trip rates than the majority of interviewers.

The interviewers know age and sex of the respondent before deciding to make a call. This might influence the choice of whom to call. Therefore, age and sex is included in the models. Other socio-economic variables of the respondents are not included.

Finally, year dummies are included.

The objective for the analysis is to investigate whether the year dummies can be rendered insignificant by the other independent variables. In this case, annual differences are completely explained by the independent variables. Furthermore, the share of variation that can be attributed to the different independent variables may reveal their relative importance for the observed changes over the years.


Association for European Transport