The goal of Railcar Itinerary Optimization in Marshalling Yards (RIO-MY) is to achieve an effective integrated operation plan for both train shunting operations and train makeup, with the aim of minimizing the railcar dwell time in the railway marshalling yard. Due to complex interdependent decisions in the disassembly and assembly process of trains, conventional optimization methods for the problem face challenges in addressing the dynamic nature of traffic in the marshalling yard and offering highly efficient solutions. This paper introduces a novel approach to the RIO-MY problem using a graph neural network based deep reinforcement learning method. First, we model the solving process of RIO-MY as a Markov decision process, utilizing a tripartite graph to represent the operational state of a marshalling yard. Then we design a novel tripartite graph isomorphism network (TGIN) to learn informative embeddings on the graph, which are exploited to reason out the joint action to simultaneously decide on hump sequencing and classification track assignment. The TGIN based policy network is trained by the proximal policy optimization algorithm, with a reward tailored to well estimate railcar dwell time for each state. Moreover, we develop a discrete-event simulation of operations in the railway marshalling yard, which serves as the reinforcement learning environment and integrates typical heuristic rules of outbound train assembly and shunting locomotive scheduling. Extensive experiments on two real-world railway marshalling yards demonstrate that the proposed method outperforms conventional heuristic algorithms. Moreover, it achieves competitive performance to the mixed integer nonlinear programming model with significantly less computational time. In addition, the trained policy networks can favourably generalize to scenarios that are unseen during training and effectively handle disturbances in the train disassembly process.
Zhang, H., Lu, G., Zhang, Y., D'Ariano, A., Wu, Y. (2025). Railcar itinerary optimization in railway marshalling yards: A graph neural network based deep reinforcement learning method. TRANSPORTATION RESEARCH. PART C, EMERGING TECHNOLOGIES, 171 [10.1016/j.trc.2024.104970].
Railcar itinerary optimization in railway marshalling yards: A graph neural network based deep reinforcement learning method
Lu G.;D'Ariano A.;
2025-01-01
Abstract
The goal of Railcar Itinerary Optimization in Marshalling Yards (RIO-MY) is to achieve an effective integrated operation plan for both train shunting operations and train makeup, with the aim of minimizing the railcar dwell time in the railway marshalling yard. Due to complex interdependent decisions in the disassembly and assembly process of trains, conventional optimization methods for the problem face challenges in addressing the dynamic nature of traffic in the marshalling yard and offering highly efficient solutions. This paper introduces a novel approach to the RIO-MY problem using a graph neural network based deep reinforcement learning method. First, we model the solving process of RIO-MY as a Markov decision process, utilizing a tripartite graph to represent the operational state of a marshalling yard. Then we design a novel tripartite graph isomorphism network (TGIN) to learn informative embeddings on the graph, which are exploited to reason out the joint action to simultaneously decide on hump sequencing and classification track assignment. The TGIN based policy network is trained by the proximal policy optimization algorithm, with a reward tailored to well estimate railcar dwell time for each state. Moreover, we develop a discrete-event simulation of operations in the railway marshalling yard, which serves as the reinforcement learning environment and integrates typical heuristic rules of outbound train assembly and shunting locomotive scheduling. Extensive experiments on two real-world railway marshalling yards demonstrate that the proposed method outperforms conventional heuristic algorithms. Moreover, it achieves competitive performance to the mixed integer nonlinear programming model with significantly less computational time. In addition, the trained policy networks can favourably generalize to scenarios that are unseen during training and effectively handle disturbances in the train disassembly process.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.