With the increasing attention to front-edge vehicular communication applications, distributed resource allocation is beneficial to the direct communications between vehicle nodes. However, in highly dynamic distributed vehicular networks, quality of service (QoS) of the systems would degrade dramatically because of serious packet collisions in the absence of sufficient link knowledge. Focusing on the fairness optimization, a Q-learning-based collision avoidance (QCA) scheme, which is characterized by an ingenious bidirectional backoff reward model R QCA corresponding to arbitrary backoff stage transitions, has been proposed in an intelligent distributed media access control protocol. In QCA, an intelligent bidirectional backoff agent based on the Markov decision process model can actively motivate each vehicle agent to update itself toward an optimal backoff sub-intervel BSI opt through either positive or negative bidirectional transition individually, resulting in the distinct fair communication with a proper balance of the resource allocation. According to the reinforcement learning theory, the problem of goodness evaluation on the backoff stage self-selection policy is equal to the problem of maximizing Q function of the vehicle in the current environment. The final decision on BSI opt related to an optimal contention window range was solved through maximizing the Q value or Q max . The "-greedy algorithm was used to keep a reasonable convergence of the Q max solution. For the fairness evaluation of QCA, four kinds of dynamic impacts on the vehicular networks were investigated: mobility, density, payload size, and data rate with a network simulator NS2. Consequently, QCA can achieve fair communication efficiently and robustly, with advantages of superior Jain's fairness index, relatively high packet delivery ratio, and low time delay.