Graph Neural Networks (GNNs) have become increasingly popular for their ability to capture complex relationships within graphs by aggregating node neighbor information. However, in graphs exhibiting high levels of heterophily relevant distant nodes are missed during neighbor aggregation, thus limiting the GNN performance in tasks like node classification. To tackle the problem of incorporating long-range relevant neighbors into the GNN node aggregation mechanism, this paper introduces the Overlay Graph Neural Networks (OGN) model. OGN is inspired by P2P overlay networks, where the idea is to find neighbor peers (nodes) that, although not directly connected to a given node (a peer), are semantically similar and could favorably improve both query routing and query results. In our context, the network is the graph, and the routing is the message passing a GNN performs to aggregate node features. OGN networks are built by stacking one or more overlay layers, each taking as input the graph and a node feature matrix either available or derivable (e.g., by analyzing the graph’s structure). Each overlay layer combines base embeddings, learned by considering node features and short-range node neighbors, with overlay embeddings computed by projecting nodes with similar features close in an overlay space and then aggregating (overlay) neighbor nodes via a sliding window attention mechanism. Base and overlay embeddings are combined to capture nodes’ immediate and global context in a graph. We evaluate OGN in a node classification task using state-of-the-art benchmarks and show that OGN is competitive with the advantage of being easily portable to any existing GNN model.