Manuel Martin Salvador, We Are Base / Bournemouth University, Marcin Budka, Bournemouth University, Tom Quay, We Are Base


Online matching of spatio-temporal sequences to a transport network is not a trivial task. This paper presents a data-driven approach using deep learning and a case study with data of real journeys from a UK-based transport operator.


Public transport users are increasingly expecting better service and up to date information, in pursuit of a seamless journey experience. In order to meet these expectations, many transport operators are already offering free mobile apps to help customers better plan their journeys and access real-time travel information. Leveraging the spatio-temporal data that such apps can produce at scale (i.e. timestamped GPS traces), opens an opportunity to bridge the gap between passenger expectations and capabilities of the operators by providing a real-time 360-degree view of the transport network based on the ‘Apps as infrastructure’ paradigm. The first step towards fulfilling this vision is to understand which routes and services the passengers are travelling on at any given time.
Mapping a GPS trace onto a particular transport network is known as ‘network matching’. In this paper, the problem is formulated as a supervised sequence classification task, where sequences are made of geographic coordinates, time, and line and direction of travel as a label. We present and compare two data-driven approaches to this problem: (i) a heuristic algorithm, which looks for nearby stops and makes an estimation based on their timetables -- used as a baseline -- and (ii) a deep learning approach using a recurrent neural network (RNN). Since RNNs require considerable amounts of data to train a good model, and collecting and labelling this data from real users is a challenging task (e.g. asking too often can be overwhelming; privacy concerns on providing GPS location; not reliable labels due to mistakes or misuse), one of our contributions is a synthetic journey data generator. The datasets that we generated have been made as realistic as possible by querying real timetables and adding position and temporal noise to simulate variable GPS accuracy and vehicle delays, sampled from empirical distributions estimated using thousands of real location reports. To validate our approach we have used a separate dataset made of hundreds of real user journeys provided by a UK-based bus operator. Our experimental results are promising and our next step is to deploy a solution in a production environment. From the operator’s point of view, this will enable multiple smart applications like account based ticketing, identification of disruptions, real-time passenger counting, and network analysis. Passengers will also, therefore, benefit from a better service and an increase in the quality of information due to leveraging such big data processing.


Association for European Transport