Belady's algorithm is widely known as an optimal cache replacement policy. It has been the foundation of numerous recent studies on cache replacement policies, and most studies assume this as an upper limit. Despite its widespread adoption, we discovered opportunities to unleash the headroom by addressing cache access types and implementing cache bypass. In this study, we propose Stormbird, a cache replacement policy that synergistically integrates the extensions of Belady's algorithm and the power of reinforcement learning. Reinforcement learning is well-suited for cache replacement policy problems owing to its ability to interact dynamically with the environment, adapt to changing access patterns, and optimize the maximum cumulative rewards. Stormbird utilizes several selected features from the reinforcement learning model to enhance the instructions per cycle efficiency while maintaining a low hardware overhead. Furthermore, it considers cache access types and integrates dynamic set dueling techniques to improve the cache performance. For 2 MB last-level cache per core, Stormbird achieves an average instructions per cycle improvement of 0.13% over the previous state-of-the-art on a single-core system and 0.02% on a fourcore system while simultaneously reducing hardware overhead by 62.5%. Stormbird incurs a low hardware overhead of only 10.5 KB for 2 MB last-level cache and can be implemented without using program counter values.INDEX TERMS Computer architecture, caches, reinforcement learning, replacement policy, Belady's algorithm, set dueling, cache access type.