<p>We propose a novel reconstruction scheme for reconstructing charged particles in digital tracking calorimeters using model-free reinforcement learning aiming to benefit from the rapid progress and success of neural network architectures for tracking without the dependency on simulated or manually labeled data. Here we optimize by trial-and-error a behavior policy acting as a heuristic approximation to the full combinatorial optimization problem, maximizing the physical plausibility of sampled trajectories. In modern data processing pipelines used in high energy physics experiments and related high energy physics driven applications tracking plays an essential role allowing to identify and follow charged particle trajectories traversing particle detectors. Due to the usual high multiplicity of charged particles as well as the occurring physical interactions, randomly deflecting the particles from their initial path, the reconstruction is a challenging undertaking, requiring fast, accurate and robust algorithms. Our approach works on graph-structured data, capturing possible track hypotheses through edge connections between particles in the sensitive detector layers. We demonstrate in a comprehensive study on simulated data generated for a particle detector used for proton computed tomography, the overall high potential as well as the competitiveness of our approach compared to a heuristic search algorithm and a model trained on ground truth information. Finally, we point out limitations of our approach, guiding towards a robust foundation for further development of reinforcement learning based tracking algorithms in high energy physics.</p>