Emotion Cause Pair Extraction (ECPE) aims to extract emotions and their causes from a document. Powerful emotion and cause extraction abilities have proven essential in achieving accurate ECPE. However, most existing methods employ shared feature learning of emotion extraction and cause extraction, which can harm the abilities of both tasks as they focus on different information (i.e., task-specific features). Moreover, shared feature learning of the two tasks also leads to the label imbalance problem. To address these issues, this paper proposes a multi-task learning framework named Hierarchical Shared Encoder with Task-specific Transformer Layer Selection (HSE-TTLS). The model achieves ECPE via two subtasks: Emotion Extraction (EE) and Emotion Cause Extraction (ECE). The design of two subtasks for ECPE corresponds to the fact that cause clauses are emotion-dependent and significantly alleviates the label imbalance problem. To effectively extract task-specific features for EE and ECE, we employ BERT as the token-level encoder and select task-specific optimal layers for the two subtasks. Focal loss is used as the objective function for EE to further alleviate the label imbalance problem. Extensive experiments on benchmark ECPE corpus demonstrate the effectiveness of HSE-TTLS, which outperforms state-of-the-art baseline methods by at least 1.56% on the F1 score.