“…For example, malicious node detection, a key application of graph machine learning, is known to be non-homophilous in many settings [55,13,25,11]. Further, while new GNNs that work better in these non-homophilous settings have been developed [82,44,81,17,15,73,36,35,9,54], their evaluation is limited to a few graph datasets used by Pei et al [58] (collected by [61,66,48]) that have certain undesirable properties such as small size, narrow range of application areas, and high variance between different train/test splits [82]. Consequently, method scalability has not been thoroughly studied in non-homophilous graph learning.…”