Objective: Energy Expenditure (EE) estimation plays an important role in objectively evaluating physical activity and its impact on human health. EE during activity can be affected by many factors, including activity intensity, individual physical and physiological characteristics, environment, etc. However, current studies only use very limited information, such as heart rate and step count, to estimate EE, which leads to a low estimation accuracy. Methods: In this study, we proposed a deep multibranch two-stage regression network (DMTRN) to effectively fuse a variety of related information including motion information, physiological characteristics, and human physical information, which significantly improved the EE estimation accuracy. The proposed DMTRN consists of two main modules: a multi-branch convolutional neural network module which is used to extract multi-scale context features from electrocardiogram (ECG) and inertial measurement unit (IMU) data, and a two-stage regression module which aggregated the extracted multi-scale context features containing the physiological and motion information and the anthropometric features to accurately estimate EE. Results: Experiments performed on 33 participants show that our proposed method is more accurate and the average root mean square error (RMSE) is reduced by 22.8% compared with previous works. Conclusion: The EE estimation accuracy was improved by the proposed DMTRN model with a well-designed network structure and new input signal ECG. Significance: This study verified that ECG was much more effective than HR for EE estimation and cast light on EE estimation using the deep learning method.