A Dynamic Bayesian Network Approach to Forecast Short-term Urban Rail Passenger Flows with Incomplete Data

A Dynamic Bayesian Network Approach to Forecast Short-term Urban Rail Passenger Flows with Incomplete Data


Jérémy Roos, University of Lyon / RATP, Gérald Gavin, University of Lyon, Stéphane Bonnevay, University of Lyon


We propose a dynamic Bayesian network approach to forecast the short-term passenger flows in the urban rail network of Paris. The model is able to forecast in the presence of incomplete data.


RATP is one of the most important public transport operators in Paris region. It uses multiple tools for passenger flow modeling, which assess the impact of infrastructure or transport policy changes over the long term. However, these models are not designed to provide short-term predictions and hence cannot take into account the effects of unanticipated or non-recurrent events (disruptions, strikes, temporary closures of stations, sporting or cultural events…). Furthermore, the diversity of available data is still largely untapped. Each service works with a limited number of sources and thereby has only a truncated view of the passenger mobility within the network. Based on these observations, we propose an approach to forecast the short-term passenger flows of the urban rail network of Paris. This approach combines data from various sources, namely ticket validation, counts and transport service (departure and arrival times at each stop point). It is designed to cater for many applications in real-time operation management, passenger information and service quality enhancement.
Various methods have been proposed for short-term traffic forecasting, but only a few authors have focused on passenger flows in public transport networks. Moreover, although missing data is a common problem in many real-world situations, most of the models are not equipped to handle this issue in a real-time setting. This is a major drawback for large networks, where collection systems are subject to failures or may be not used continuously.
The approach we choose is based on Bayesian networks. The main advantages of these models are their flexibility that makes it possible to combine data from heterogeneous sources and their ability to forecast in the presence of incomplete data. Considering the causal relationships between the upstream and downstream passenger flows, the structure of the model derives directly from the transport network. Assuming that the interactions between the flows are linear, the conditional probability distributions between the nodes are described as linear Gaussians. A first test is carried out on the RER station Nanterre-Préfecture, using a complete set of ticket validation and count data. This test provides encouraging results, but highlights the need to take into account the transport service.
To forecast the future values of the flows, we then extend the modeling to the time factor, by using a dynamic Bayesian network. A new layer of nodes is also created to integrate the transport service. We apply the model to the entire Paris metro line 2, on the basis of data collected during weekdays of March and April 2015 in the morning peak period. Given the incompleteness of these data, we perform a structural expectation-maximization (SEM) algorithm both to reduce the structure and to find the maximum likelihood estimate of the parameters. Finally, short-term forecasting is conducted by inference, using a stochastic simulation algorithm called the bootstrap filter. Overall, the results outperform the historical average and last observation carried forward (LOCF) methods. They also evidence the fundamental role of the transport service in the modeling.


Association for European Transport