In this paper, we study graph-to-image generation conditioned exclusively on scene graphs, in which we seek to disentangle the veiled semantics between knowledge graphs and images. While most existing research resorts to laborious auxiliary information such as object layouts or segmentation masks, it is also of interest to unveil the generality of the model with limited supervision, moreover, avoiding extra cross-modal alignments. To tackle this challenge, we delve into the causality of the adversarial generation process, and reason out a new principle to realize a simultaneous semantic disentanglement with an alignment on target and model distributions. This principle is named knowledge consensus, which explicitly describes a triangle causal dependency among observed images, graph semantics and hidden visual representations. The consensus also determines a new graph-to-image generation framework, carried on several adversarial optimization objectives. Extensive experimental results demonstrate that, even conditioned only on scene graphs, our model surprisingly achieves superior performance on semantics-aware image generation, without losing the competence on manipulating the generation through knowledge graphs.