Abstract. Annual reductions in Arctic sea ice extent (SIE) due to global warming. According to International Panel on Climate Change (IPCC) climate model projections, the summer Arctic will be nearly sea ice free in the 50s of the 21st century, resulting in sea level rise and thus affecting human life. Therefore, it is important to predict SIE accurately. For the most current studies, the majority of deep learning-based SIE prediction models focus on single-step prediction, and they not only have short lead times but also have limited forecasting skills. In addition, these models often lack interpretability. In this study paper, we construct the Ice Temporal Fusion Transformer (IceTFT) model, which consists mainly of the variable selection network (VSN), the long short-term memory (LSTM) encoder, and multi-headed attention mechanism. Then we select 11 predictors for IceTFT model, including SIE, atmospheric, and ocean variables according to the physical mechanisms influencing sea ice development. And the VSN in IceTFT can automatically adjust the weights of predictors and filter spuriously correlated variables. We also evaluate the IceTFT model from the division of the training set, the slicing methods of input data, and the length of input. The IceTFT model directly generates 12-month SIE with average monthly prediction errors of less than 0.21 106 km2. And it predicts the September SIE nine months in advance with prediction error of less than 0.1 106 km2, which is superior to the models from Sea Ice Outlook (SIO). Furthermore, we analyze the sensitivity of the selected predictors to the SIE prediction. It verifies that the IceTFT model has some physical interpretability. And the variable sensitivities also provide some reference for understanding the mechanisms governing sea ice development and selecting the assimilation variables in dynamic models.