Graph Neural Networks (GNNs) have achieved great success in learning graph representations and thus facilitating various graphrelated tasks. However, most GNN methods adopt a supervised learning setting, which is not always feasible in real-world applications due to the difficulty to obtain labeled data. Hence, graph self-supervised learning has been attracting increasing attention. Graph contrastive learning (GCL) is a representative framework for self-supervised learning. In general, GCL learns node representations by contrasting semantically similar nodes (positive samples) and dissimilar nodes (negative samples) with anchor nodes. Without access to labels, positive samples are typically generated by data augmentation, and negative samples are uniformly sampled from the entire graph, which leads to a sub-optimal objective. Specifically, data augmentation naturally limits the number of positive samples that involve in the process (typically only one positive sample is adopted). On the other hand, the random sampling process would inevitably select false-negative samples (samples sharing the same semantics with the anchor). These issues limit the learning capability of GCL. In this work, we propose an enhanced objective that addresses the aforementioned issues. We first introduce an unachievable ideal objective that contains all positive samples and no false-negative samples. This ideal objective is then transformed into a probabilistic form based on the distributions for sampling positive and negative samples. We then model these distributions with node similarity and derive the enhanced objective. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed enhanced objective under different settings.
INTRODUCTIONGraphs are regarded as a type of essential data structure to represent many real-world data, such as social networks [8,11], transportation networks [37], and chemical molecules [21,43]. Many real-world applications based on these data can be naturally treated as computational tasks on graphs. To facilitate these graph-related tasks, it is essential to learn high-quality vector representations for graphs and their components. Graph neural networks (GNNs) [23,41,44], which generalize deep neural networks to graphs, have demonstrated their great power in graph representation learning,