Abstract-In this article, we focus on inter-cell interference coordination (ICIC) techniques in heterogeneous network (HetNet) deployments, whereby macro-and picocells autonomously optimize their downlink transmissions, with loose coordination. We model this strategic coexistence as a multi-agent system, aiming at joint interference management and cell association. Using tools from Reinforcement Learning (RL), agents (i.e., macro-and picocells) sense their environment, and self-adapt based on local information so as to maximize their network performance. Specifically, we explore both time-and frequency domain ICIC scenarios, and propose a two-level RL formulation. Here, picocells learn their optimal cell range expansion (CRE) bias and transmit power allocation, as well as appropriate frequency bands for multi-flow transmissions, in which a user equipment (UE) can be simultaneously served by two or more base stations (BSs) from macro-and pico-layers. To substantiate our theoretical findings, Long Term Evolution Advanced (LTE-A) based system level simulations are carried out in which our proposed approaches are compared with a number of baseline approaches, such as resource partitioning (RP), static CRE, and single-flow Carrier Aggregation (CA). Our proposed solutions yield substantial gains up to 125% compared to static ICIC approaches in terms of average UE throughput in the timedomain. In the frequency-domain our proposed solutions yield gains up to 240% in terms of cell-edge UE throughput.