Recently, graph neural networks (GNN), due to their compelling representation learning ability, have been exploited to deal with emotion-cause pair extraction (ECPE). However, current GNN-based ECPE methods mostly concentrate on modeling the local dependency relation between homogeneous nodes at the semantic granularity of clauses or clause pairs, while they fail to take full advantage of the rich semantic information in the document. To solve this problem, we propose a novel hierarchical heterogeneous graph attention network to model global semantic relations among nodes. Especially, our method introduces all types of semantic elements involved in the ECPE, not just clauses or clause pairs. Specifically, we first model the dependency between clauses and words, in which word nodes are also exploited as an intermediary for the association between clause nodes. Secondly, a pair-level subgraph is constructed to explore the correlation between the pair nodes and their different neighboring nodes. Representation learning of clauses and clause pairs is achieved by two-level heterogeneous graph attention networks. Experiments on the benchmark datasets show that our proposed model achieves a significant improvement over 13 compared methods.