Heterogeneous networks (HetNets), in which small cells overlay macro cells, are a cost-effective approach to increase the capacity of cellular networks. However, HetNets have raised new issues related to cell association and interference management. In particular, the optimal configuration of interference coordination (IC) parameters is a challenging task because it depends on multiple stochastic processes such as the locations of the users, the traffic demands, or the strength of the received signals. This work proposes a self-optimization algorithm capable of finding the optimal configuration in an operating network. We address the problem using a reinforcement learning (RL) approach, in which the actions are the IC configurations, whose performances are initially unknown. The main difficulty is that, due to the variable network conditions, the performance of each action may change over time. Our proposal is based on two main elements: the sequential exploration of subsets of actions (exploration regions), and an optimal stopping (OS) strategy for deciding when to end current exploration and start a new one. For our algorithm, referred to as local exploration with optimal stopping (LEOS), we provide theoretical bounds on its long-term regret per sample and its convergence time. We compare LEOS to state-of-the-art learning algorithms based on multiarmed bandits and policy gradient RL. Considering different changing rates in the network conditions, our numerical results show that LEOS outperforms the first alternative by 22%, and the second one by 48% in terms of average regret per sample.