A prominent challenge for modern language understanding systems is the ability to answer implicit reasoning questions, where the required reasoning steps for answering the question are not mentioned in the text explicitly. In this work, we investigate why current models struggle with implicit reasoning question answering (QA) tasks, by decoupling inference of reasoning steps from their execution. We define a new task of implicit relation inference and construct a benchmark, IMPLICITRE-LATIONS, where given a question, a model should output a list of concept-relation pairs, where the relations describe the implicit reasoning steps required for answering the question. Using IMPLICITRELATIONS, we evaluate models from the GPT-3 family and find that, while these models struggle on the implicit reasoning QA task, they often succeed at inferring implicit relations. This suggests that the bottleneck for answering implicit reasoning questions is in the ability of language models to retrieve and reason over information rather than to plan an accurate reasoning strategy.Recent advances in QA Lourie et al., 2021) have steered attention towards implicit reasoning QA benchmarks such as STRAT-EGYQA (Geva et al., 2021), OPENCSR (Lin et al., 2021), COMMONSENSEQA 2.0 (Talmor et al., 2021, CREAK (Onoe et al., 2021), and REALFP (Kalyan et al., 2021), which span a wide range of domains and reasoning skills. Still, implicit reasoning remains an open challenge, even for large language models (LMs) such as GPT-3 and PaLM (BIG-bench collab.