Mobile Adhoc Networks (MANETs) are vulnerable to various attacks such as Black Hole Attack (BHA), Gray Hole Attack (GHA), and Wormhole Attacks (WHA). While researchers have focused on detecting and mitigating individual attacks, protection against collaborative attacks is limited. Therefore, this article introduces a new Tunicate Swarm Optimization Q-learning-based Collaborative Attacker Detection Algorithm (TSOQCADA) to identify and prevent collaborative attackers like BHA, GHA, and WHA, thereby improving routing efficiency. This algorithm utilizes feedback about node properties such as energy, reputation, buffer space, transmission delay, and packet transfer rate from all nodes to determine efficient packet routing. In the TSOQ-learning algorithm, the TSO is adopted to set the Q-table values, resulting in faster convergence of Q-learning. First, a Q-table with prior knowledge is trained to enhance searchability. Additionally, a novel selective search mechanism is adopted to improve exploration efficiency and reduce unwanted explorations by considering the correlation between current and target locations. Furthermore, a nonlinear function is designed to achieve a tradeoff between search and use abilities in Q-learning, dynamically changing ε value in the ε-greedy method according to the number of iterations. Thus, the TSOQ-learning can efficiently obtain a routing path by isolating collaborative attackers with low reputation values. Simulation results show that the TSOQCADA achieves a Packet Delivery Ratio (PDR) of 94.8%, Packet Loss Rate (PLR) of 5.2%, energy consumption of 2.53J energy/packet, throughput of 355Kbps, and End-to-End (E2E) delay of 35ms for a network of 100 nodes with 20 malicious nodes in MANET, outperforming the Efficient Trust-based Routing Scheme (ETRS), Hybrid Trust-based Reputation Mechanism (HTRM) and Deep Neural Learned Projective Pursuit Regression-based Watchdog Malicious Node Detection and Isolation (DNLPPR-WMNDI) algorithms.