Integrated with remote sensing technology, deep learning has been increasingly used for rapid damage assessment. Despite reportedly having high accuracy, the approach requires numerous samples to maintain its performance. However, in the emergency response phase, training samples are often unavailable. Since no ground truth data is available, deep learning models cannot be trained for this specific situation and thus have to be applied to unseen data. Previous research has implemented transfer learning techniques to solve data unavailability. However, many studies do not accurately reflect the rapid damage mapping in real-world scenarios. The present study illustrates the use of earth observation and deep learning technologies in predicting damage in realistic emergency response settings. To this aim, we conducted extensive experiments using historical data to find the best model by examining multiple Neural Network models and loss functions. Then, we evaluated the performance of the best model for predicting building damage due to two different disasters, the 2011 Tohoku Tsunami and the 2023 Turkiye-Syria Earthquake, which were independent of the training samples. We found that a Transformer-based model with a combined Cross Entropy Loss and Focal Loss generates the highest scoring values. The testing on both unseen sites illustrates that the model can perform well in no-damage and destroyed classes. However, the scores dropped in the middle class. We also compared our transformer-based approach with other state-of-the-art models, specifically the xView-2 winning solution. The results show that the transformer-based models have stable generalization toward multi-class classification and multi-resolution imagery.