Web phishing is a form of cybercrime aimed at tricking people into visiting malicious URLs to exfiltrate sensitive data. Since the structure of a malicious URL evolves over time, phishing detection mechanisms that can adapt to such variations are paramount. Furthermore, web phishing detection is an unbalanced classification task, as legitimate URLs outnumber malicious ones in real-life cases. Deep learning (DL) has emerged as a promising technique to minimize concept drift to enhance web phishing detection. Deep reinforcement learning (DRL) combines DL with reinforcement learning (RL); that is, a sequential decision-making paradigm in which the problem to be addressed is expressed as a Markov decision process (MDP). Recent studies have proposed an ad hoc MDP formulation to tackle unbalanced classification tasks called the imbalanced classification Markov decision process (ICMDP). In this paper, we exploit the ICMDP to present a double deep Q-Network (DDQN)-based classifier to address the unbalanced web phishing classification problem. The proposed algorithm is evaluated on a Mendeley web phishing dataset, from which three different data imbalance scenarios are generated. Despite a significant training time, it results in better geometric mean, index of balanced accuracy, F1 score, and area under the ROC curve than other DL-based classifiers combined with data-level sampling techniques in all test cases.