Objective. Despite the high-quality treatment, the long treatment time of the Cyberknife system is believed to be a drawback. The high flexibility of its robotic arm requires meticulous path-finding algorithms to deliver the prescribed dose in the shortest time. Approach. We proposed a Deep Q-learning based on Graph Neural Networks to find the subset of the beams and the order to traverse them. A complex reward function is defined to minimize the distance covered by the robotic arm while avoiding the selection of close beams. Individual beam scores are also generated based on their effect on the beam intensity and are incorporated in the reward function. Main results. The performance of the presented method is evaluated on three clinical cases suffering from lung cancer. Applying this approach leads to an average of 35% reduction in the treatment time while delivering the prescribed dose provided by the physicians. Significance. Shorter treatment times result in a better treatment experience for individual patients, reduces discomfort and the sides effects of inadvertent movements for them. Additionally, it creates the opportunity to treat a higher number of patients in a given time period at the radiation therapy centers.