In this paper, the minimization of the weighted sum average age of information (AoI) in a multisource status update communication system is studied. Multiple independent sources send update packets to a common destination node in a time-slotted manner under the limit of maximum retransmission rounds. Different multiple access schemes, i.e., orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are exploited here over a block-fading multiple access channel (MAC). Constrained Markov decision process (CMDP) problems are formulated to describe the AoI minimization problems considering both transmission schemes. The Lagrangian method is utilised to convert CMDP problems to unconstraint Markov decision process (MDP) problems and corresponding algorithms to derive the power allocation policies are obtained. On the other hand, for the case of unknown environments, two online reinforcement learning approaches considering both multiple access schemes are proposed to achieve near-optimal age performance. Numerical simulations validate the improvement of the proposed policy in terms of weighted sum AoI compared to the fixed power transmission policy, and illustrate that NOMA is more favorable in case of larger packet size. 2 networks and disaster monitoring and alerting systems, strictly guaranteeing the timeliness of information updates is crucial since outdated information might become worthless. From the perspective of system, the knowledge of the status of a remote sensor or system requires to be as timely as possible, so the timeliness of state updates has evolved into a new field of network research [3]. To characterize such information timeliness and freshness, the metric termed age of information (AoI), typically defined as the time elapsed since the most recent successfully received system information was generated at the source, has been proposed [4].Most of the earlier work on AoI in various networks mainly consider simple single-source single-destination status update system models (see, e.g., [4]-[8]), while recent researches related to AoI optimization have shifted to more practical multi-source and/or multi-destination systems and most of them involve orthogonal multiple access (OMA) technique [9]-[14]. For instance, the authors in [9] considered a system model in which a central controller collects data from multiple sensors via wireless links and the AoI optimization problem is subject to both bandwidth and power consumption constraints. Besides, in [9], a truncated scheduling policy was proposed to satisfy the hard bandwidth constraint. The work in [10] presented two multi-source information update problems in a practical IoT system, called AoI-aware Multi-Source Information Updating (AoI-MSIU) and AoI-Reduction-aware Multi-Source Information Updating (AoIR-MSIU) problems,respectively. A wireless broadcast network with random arrivals was considered in [11], where two offline and two online scheduling algorithms were proposed, leveraging Markov decision process (MDP) techniques and the ...