Penetration testing (PT) serves as an effective tool for examining networks and identifying vulnerabilities by simulating a hacker's attack to uncover valuable information, such as details about the host's operating and database systems. Strong penetration testing is crucial for assessing system vulnerabilities in the constantly changing world of cyber security. Existing methods often struggle with adapting to dynamic threats, providing limited automation, and lacking the ability to discern subtle security weaknesses. In comparison to manual PT, intelligent PT has gained widespread popularity due to its efficiency, resulting in reduced time consumption and lower labor costs. Considering this, the effective penetration testing framework is developed using prairie natural swarm (PNS) optimized Q-learning ensemble deep CNN. Initially, the penetration testing environment (Shodan search engine) is simulated, and along with that expert knowledge base is also generated. Subsequently, the Nmap script engine and Metasploit are deployed, providing robust tools for network investigation and vulnerability assessment. The system state is then relayed to the Q-learning ensemble deep convolutional neural network (Qlearning ensemble deep CNN) classifier. This unique ensemble combines the strengths of Q-learning and deep CNNs, enabling optimal policy learning for decision-making. The prairie natural swarm optimization algorithm is developed through the hybridization of coyote and particle swarm characteristics to fine-tune classifier parameters, enhancing performance. Additionally, the discriminator is trained to maximize standard action rewards while minimizing discounted action rewards, distinguishing valuable from less valuable information. By evaluating the advantage function, successful penetration likelihood is determined, informing situational decision-making through the Q-learning ensemble deep CNN classifier. Accuracy, sensitivity, and specificity as well as the proposed PNS-optimized Q-learning ensemble deep model are used to evaluate the output. In comparison to other approaches currently in use, CNN achieves values of 94.54%, 94.40%, 94.90% for TP, 94.64%, 94.69%, and 94.52% for k-fold.