This paper studies the problem of non-cooperative radio resource scheduling in a vehicle-to-vehicle communication network. The technical challenges lie in high vehicle mobility and data traffic variations.Over the discrete scheduling slots, each vehicle user equipment (VUE)-pair competes with other VUEpairs in the coverage of a road side unit (RSU) for the limited frequency to transmit queued packets. The frequency allocation at the beginning of each slot by the RSU is regulated following a sealed secondprice auction. Each VUE-pair aims to optimize the expected long-term performance. Such interactions among VUE-pairs are modelled as a stochastic game with a semi-continuous global network state space.By defining a partitioned control policy, we transform the stochastic game into an equivalent game with a global queue state space of finite size. We adopt an oblivious equilibrium (OE) to approximate the Markov perfect equilibrium (MPE), which characterizes the optimal solution to the equivalent game.The OE solution is theoretically proven to be with an asymptotic Markov equilibrium property. Due to the lack of a priori knowledge of network dynamics, we derive an online algorithm to learn the OE policies. Numerical simulations validate the theoretical analysis and show the effectiveness of the proposed online learning algorithm.
Index TermsVehicle-to-vehicle communications, multi-user resource scheduling, stochastic games, Markov decision process, Markov perfect equilibrium, oblivious equilibrium, learning.X. Chen is with the ). 2 I. INTRODUCTION The next generation vehicle-to-everything (V2X) technologies have been receiving increasing attentions for enabling emerging vehicular services, such as traffic safety, congestion reporting and in-vehicle infotainment [1]-[3]. In particular, vehicle-to-vehicle (V2V) communication, operating in an ad hoc manner, provides more flexibility to render more attractive vehicle-related applications [4]. This type of vehicular applications have an ontological feature of requiring coordinations among the vehicles in close proximity [5]. However, the topology of a V2V communication network changes dynamically across the time horizon because of the high vehicle mobility. Without the support of an infrastructure, this in turn makes the design of radio resource management (RRM) techniques extremely challenging [6]. In the literature, there are a number of works focusing on RRM in V2V communications. In [7], Bai et al. proposed a low-complexity outage-optimal distributed channel allocation scheme for V2V communications based on maximum matching. In [8], Sun et al. investigated RRM for device-to-device based V2V communications, for which a separate resource block and power allocation algorithm was proposed. Yao et al. proposed in [9] a loss differentiation rate adaptation scheme to meet the stringent delay and reliability requirements for V2V safety communications. In [10], Egea-Lopez et al. proposed a fair adaptive beaconing rate for the intervehicular communications algorithm to solve the...