Treatment and Cleaning Data in Mobility Surveys: the Case of EMEF in the Barcelona Metropolitan Region

Treatment and Cleaning Data in Mobility Surveys: the Case of EMEF in the Barcelona Metropolitan Region

Nominated for The Planning for Sustainable Land Use and Transport Award


Manel Pons, IERMB (Institut D'Estudis Regionals I Metropolitans De Barcelona), Elisabeth Queralt, IERMB (Institut D'Estudis Regionals I Metropolitans De Barcelona), Jorge Cátedra, IERMB (Institut D'Estudis Regionals I Metropolitans De Barcelona)


The paper explains the methodology to cleaning data in mobility surveys throughout the process of obtaining the information. This process consists on controlling the EMEF survey fieldwork from the beginning, detecting and correcting inconsistencies.


The mobility surveys are statistical instruments that allow obtaining information on how and why the population moves. At the same time they represent tools of analysis of the trends for the planning of mobility. From the data provided, researchers can analyse the mobility patterns based on the characteristics of the population and different territories; also planners and designers can plan infrastructures, services, calculate external effects, as well as to evaluate the results of policies implemented or to see the willingness of citizens regarding future public policies.
Since 2003, in the Barcelona metropolitan region an annual mobility survey called EMEF (Working Day Mobility Survey) is being conducted. Therefore, the EMEF series includes 14 consecutive years of indicators not only about mobility patterns, but also about the opinion of the residents in relation with some mobility public policies. This survey collects data using CATI methodology on two databases: one collects demographic data and opinion of individuals, and the second data base collects data about the trips of the sample and their characteristics. In the 2016 edition, the EMEF obtained a sample of 9.601 interviewers, and were characterized 33,035 trips of them.
The information gathered is very sensitive to individual level, because the survey collect personal information of how, when, how much, where and why the individuals move. This information is very dynamic since there are many parameters to control, depending on the characteristics of the individuals. This can affect the final results. The EMEF’s treatment not only has a control of the filters and answers ranges running, but also logical behaviours are controlled. For instance, if an employed person has not gone to work in the previous working day, he/she is being asked the reason why he/she has not attended his/her place of work. This may generate an inconsistency.
So for the completion of these analyses and further decision making, is necessary to collect a completely reliable, consistent and with a high level of quality information. For this reason, one of the main priorities of its promoters and researchers has always been to avoid the effect known GIGO (Garbage In Garbage Out): no matter how good the techniques and methods of analysis, if the data is wrong then the results will be erroneous. Therefore, since the beginning of the EMEF, the IERMB’s researchers have initiated and developed a methodology of fieldwork quality control, a process of cleaning survey data to ensure the final high level quality of the information that will be analysed later.
The paper explains the methodology throughout the process of obtaining the information. This process consists on controlling absolutely the fieldwork from the beginning, in an absolute an intensive mode, through a supervision team. This control makes possible to respond to study objectives and to satisfy the assumptions of the research with the highest quality data possible, by detecting and correcting inconsistencies and errors almost directly from respondents. A strong point of this method is that consistency analysis is done in the early 24-48 hours after the completion of the interview. Doing so minimizes the time between the fact (displacement), the record (when the displacement was explained in the interview) and the inconsistency correction, because if there is a long time between the interview and the problem resolution, the memory of the data/trip is less precise, producing a bias knows as telescope effect. Doing so, allows solving 98’5% of the problems (EMEF2016) and have the databases updated.
The main conclusion is that implementing the cleaning data methodology simultaneously with the fieldwork allows feasible achieve timings and fulfill the research planning made ‘a priori’, so this doesn’t generate more costs of the estimates from a beginning of study. Another important result is to involve the fieldwork company in the achievements through the win-win concept, since they can improve their internal processes thanks to this control. This cleaning data methodology is totally versatile and adaptable to other types of surveys, regardless of the methodology used to collect data (CATI, CAPI, PAPI, CAWI, etc.).


Association for European Transport