We consider a multi-source relaying system where independent sources randomly generate status update packets which are sent to the destination with the aid of a relay through unreliable links. We develop transmission scheduling policies to minimize the weighted sum average age of information (AoI) subject to transmission capacity and long-run average resource constraints. We formulate a stochastic control optimization problem and solve it using a constrained Markov decision process (CMDP) approach and a drift-plus-penalty method. The CMDP problem is solved by transforming it into an MDP problem using the Lagrangian relaxation method. We theoretically analyze the structure of optimal policies for the MDP problem and subsequently propose a structure-aware algorithm that returns a practical near-optimal policy. Using the drift-plus-penalty method, we devise a near-optimal low-complexity policy that performs the scheduling decisions dynamically. We also develop a model-free deep reinforcement learning policy for which the Lyapunov optimization theory and a dueling double deep Qnetwork are employed. The complexities of the proposed policies are analyzed. Simulation results are provided to assess the performance of our policies and validate the theoretical results. The results show up to 91% performance improvement compared to a baseline policy.Index Terms-Age of information (AoI), relay, constrained Markov decision process (CMDP), drift-plus-penalty, deep reinforcement learning.1 This relay could be a static node [10] or a mobile node, e.g., unmanned aerial vehicle (UAV) [26]-[31] or a vehicle in the vehicular communications [32]. For instance, in [30], multiple UAVs serve as mobile relays between the sensors and the base station, and the goal is to optimize the UAVs' trajectories to minimize the average AoI and energy consumption.