Rail traffic planning and scheduling problems have been challenging academy and industry for a few decades. Specifically, problems in the short term and real-time horizons deal with simultaneous decision-making of trains, stations and terminals. Approaches focused on decentralised decision-making have been successful in delivering real-world committed solutions. This work focuses on decentralised realtime decision-making in a closed freight rail network and applies multi-agent deep reinforcement learning (MADRL) to find efficient timetables.We apply the MADRL model to solve the traffic decisions arising in the Hunter Valley Coal Chain (HVCC) in New South Wales, Australia. The approach uses the same simulation model currently in use for capacity planning of the system, thus allowing tests with real data. The environment is modelled as a decentralised, partially observed Markov decision process (dec-POMDP), where the train, load point, and dump station agents decide upon train movements based on local observations. The observations follow a novel state encoding strategy for rail traffic management composed of nine layers. We benefit from this strategy to apply a decentralised execution with a centralised learning approach through proximal policy optimisation.The experiments revealed a significant performance improvement for the ten instances tested, which reproduce the challenges faced in the HVCC operations. The approach is suitable for varied levels of rail network complexity, generating efficient solutions without scaling issues. The MADRL outperformed the heuristic in use by HVCC's simulation model and a high-performance genetic algorithm in all instances, reaching performance improvements of up to 72.00% and 47.42%, respectively. Therefore, the framework with the MADRL and the simulation model allows its application with real world instances in an efficient and reliable way. These results show the method's consistency and draw a safe path towards a decentralised rail traffic management system.