Most benchmarks designed for question answering over knowledge bases (KBQA) operate with the i.i.d. assumption where one encounters the same schema items during inference as those observed during training. Recently, the GrailQA dataset was established to evaluate zero-shot generalization capabilities of KBQA models as a departure from the i.i.d. assumption. Reasonable performance of current KBQA systems on the zero-shot GrailQA split hints that the field might be moving towards more generalizable systems. In this work, we observe a bias in the GrailQA dataset towards simpler one or two-hop questions, which results in an inaccurate assessment of the aforementioned prowess. We propose GrailQA++, a challenging zero-shot KBQA test set that contains more questions relying on complex reasoning. We leverage the concept of graph isomorphisms to control the complexity of the questions and to ensure that our proposed test set has a fair distribution of simple and complex questions. Existing KBQA models suffer a substantial drop in performance on our constructed new test set as compared to the GrailQA zero-shot split. Our analysis reveals how isomorphisms can be used to understand the complementary strengths of different KBQA models and provide a deeper insight into model mispredictions. Overall, our paper highlights the non-generalizability of existing models and the necessity for designing more challenging benchmarks. Our dataset is available at https://github.com/sopankhosla/ GrailQA-PlusPlus