The datasets most widely used for abusive language detection contain lists of messages, usually tweets, that have been manually judged as abusive or not by one or more annotators, with the annotation performed at message level. In this paper, we investigate what happens when the hateful content of a message is judged also based on the context, given that messages are often ambiguous and need to be interpreted in the context of occurrence. We first re-annotate part of a widely used dataset for abusive language detection in English in two conditions, i.e. with and without context. Then, we compare the performance of three classification algorithms obtained on these two types of dataset, arguing that a context-aware classification is more challenging but also more similar to a real application scenario.'Abuse is contextual' is one of the key claims reported in (Prabhakaran et al., 2020), where the authors describe the content of a panel with NLP practitioners and human rights experts at the Human Rights Conference RightsCon 2020. A similar remark is made in (Jurgens et al., 2019) in a recent position paper on current NLP methods to fight online abuse, arguing that NLP research tends to consider a narrow scope of what constitutes abuse, without respecting, for instance, community norms in classification decisions. Similarly, (Vidgen et al., 2019) claim that one of the main research challenges concerning abusive language detection is accounting for context, although they focus mainly on user-level variables and network representations.In this work, we present an analysis aimed at better understanding what context is in abusive language detection, and what its effects are on data annotation and classification. We focus on discourse context, i.e. the messages preceding a given tweet, because it has been relatively understudied in abusive language detection, while user-level and network-level features have already been extensively discussed in past works (Fehn Unsvåg and