In high-performance three-dimensional Integrated Circuits (3D ICs), clock networks consume a large portion of the full-chip power. However, no previous 3D IC work has ever optimized 3D clock networks for both power and performance simultaneously, which results in sub-optimal 3D designs. To overcome this issue, in this article, we propose a GNN-based flip-flop clustering algorithm that merges single-bit flip-flops into multi-bit flip-flops in an unsupervised manner, which jointly optimizes the power and performance metrics of clock networks. Moreover, we integrate our algorithm into the state-of-the-art 3D physical design flow and verify the integration, which leads to a better 3D full-chip design. Experimental results on eight industrial benchmarks demonstrate that the algorithm achieves improvements up to 18% in total power and 8.2% in performance over the state-of-the-art 3D flow.