“…Many research works have applied learning techniques to RA for D2D communications [ 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 ]. As a learning principle for training RA units, DL [ 18 , 19 , 20 , 21 , 22 ], RL [ 23 , 24 , 25 ], and DRL [ 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 ] have been widely utilized. Depending on who determines the resource allocations for D2D devices, two types of RA schemes have been proposed: a centralized RA [ 18 , 19 , 20 , 23 , 26 , 27 , 28 , 31 ] and a decentralized RA [ 20 , 21 , 22 , 23 , 25 , 29 , 30 , 32 , 33 , 34 ].…”