The exponential increase in the demand for high-performance services such as streaming video and gaming by wireless devices has posed several challenges for Wireless Local Area Networks (WLANs). In the context of Wi-Fi, the newest standards, IEEE 802.11ax, and 802.11be, bring high data rates in dense user deployments. Additionally, they introduce new flexible features in the physical layer, such as dynamic Clear-Channel-Assessment (CCA) thresholds, to improve spatial reuse (SR) in response to radio spectrum scarcity in dense scenarios. In this paper, we formulate the Transmission Power (TP) and CCA configuration problem with the objective of maximizing fairness and minimizing station starvation. We present five main contributions to distributed SR optimization using Multi-Agent Multi-Armed Bandits (MA-MABs). First, we provide regret analysis for the distributed Multi-Agent Contextual MABs (MA-CMABs) proposed in this work. Second, we propose reducing the action space given the large cardinality of action combinations of TP and CCA threshold values per Access Point (AP). Third, we present two deep MA-CMAB algorithms, named Sample Average Uncertainty (SAU)-Coop and SAU-NonCoop, as cooperative and non-cooperative versions to improve SR. Additionally, we analyze the viability of using MA-MABs solutions based on the ϵ-greedy, Upper Bound Confidence (UCB), and Thompson (TS) techniques. Finally, we propose a deep reinforcement transfer learning technique to improve adaptability in dynamic environments. Simulation results show that cooperation via the SAU-Coop algorithm leads to a 14.7% improvement in cumulative throughput and a 32.5% reduction in Packet Loss Rate (PLR) in comparison to non-cooperative approaches. Under dynamic scenarios, transfer learning mitigates service drops for at least 60% of the total users.