The AI-driven applications and large data generated by IoT devices connected to large-scale utility infrastructures pose significant operational challenges, including increased latency, communication overhead, and computational imbalances. Addressing these is essential to shift the workloads from the cloud to the edge and across the entire computing continuum. However, to achieve this, significant challenges must still be addressed, particularly in decision making to manage the trade-offs associated with workload offloading. In this paper, we propose a task-offloading solution using Reinforcement Learning (RL) to dynamically balance workloads and reduce overloads. We have chosen the Deep Q-Learning algorithm and adapted it to our workload offloading problem. The reward system considers the node’s computational state and type to increase the utilization of the computational resources while minimizing latency and bandwidth utilization. A knowledge graph model of the computing continuum infrastructure is used to address environment modeling challenges and facilitate RL. The learning agent’s performance was evaluated using different hyperparameter configurations and varying episode lengths or knowledge graph model sizes. Results show that for a better learning experience, a low, steady learning rate and a large buffer size are important. Additionally, it offers strong convergence features, with relevant workload tasks and node pairs identified after each learning episode. It also demonstrates good scalability, as the number of offloading pairs and actions increases with the size of the knowledge graph and the episode count.