Stagnation at local optima represents a significant challenge in bio–inspired optimization algorithms, often leading to suboptimal solutions. This paper addresses this issue by proposing a hybrid model that combines the Orca Predator Algorithm with Deep Q–Learning. Orca Predator Algorithm is an optimization technique that mimics the hunting behavior of orcas. It solves complex optimization problems by exploring and exploiting search spaces efficiently. Deep Q–Learning is a reinforcement learning technique that combines Q–Learning with deep neural networks. This integration aims to turn the stagnation problem into an opportunity for more focused and effective exploitation, enhancing the optimization technique’s performance and accuracy. The proposed hybrid model leverages the biomimetic strengths of Orca Predator Algorithm to identify promising regions nearby in the search space, complemented by the fine–tuning capabilities of Deep Q–Learning to navigate these areas precisely. The practical application of this approach is evaluated using the high–dimensional Heartbeat Categorization Dataset, focusing on the feature selection problem. This dataset, comprising complex electrocardiogram signals, provided a robust platform for testing the feature selection capabilities of our hybrid model. Our experimental results are encouraging, showcasing the hybrid strategy capability to identify relevant features without significantly compromising the performance metrics of machine learning models. This analysis was performed by comparing the improved method of Orca Predator Algorithm against its native version and a set of state–of–the–art algorithms.