A considerable number of texts encountered daily are somehow connected. For example, Wikipedia articles refer to other articles via hyperlinks, or scientific papers relate to others via citations or (co)authors; tweets relate via users that follow each other or reshare content. Hence, a graph-like structure can represent existing connections and be seen as capturing the “context” of the texts. The question thus arises of whether extracting and integrating such context information into a language model might help facilitate a better-automated understanding of the text. In this study, we experimentally demonstrate that incorporating graph-based contextualization into the BERT model enhances its performance on an example of a classification task. Specifically, in the Pubmed dataset, we observed a reduction in balanced mean error from 8.51% to 7.96%, while increasing the number of parameters just by 1.6%.