Warning: this paper contains content that may be offensive and distressing.State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing sourcespecific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hatespeech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hatespeech classes, yielding consistent improvements in cross-domain evaluation.