In recent years, object counting has been investigated and has made significant progress under a surveillance-view. However, there exists only a few works focusing on the remote sensing object density estimation, and the performance of existing methods is not promising. On the one hand, due to the imbalance distribution of targets in remote sensing images, the model might collapse, leading a severe degradation. On the other hand, the scale of targets in remote sensing images actually varies in real scenarios, which remains a challenge for counting objects accurately. To remedy the above problems, we propose an approach named “SwinCounter” for object counting in remote sensing. Moreover, we introduce a Balanced MSE Loss to pay more attention to the fewer samples, which alleviates the problem of imbalanced object labels. In addition, the attention mechanism in our SwinCounter can precisely capture multi-scale information. Thus, the model is aware of different scales of objects, which capture small and dense targetes more precisely. We build experiments on the RSOC dataset, achieving MAEs of 7.2, 151.5, 14.38, and 52.88 and MSEs of 10.1, 436.0, 22.7, and 74.82 on the Building, Small-Vehicle, Large-Vehicle, and Ship sub-datasets, which demonstrates the competitiveness and superiority of the proposed method.