In Simon-type interference tasks, participants are asked to perform a 2-choice reaction on a stimulus dimension while ignoring the stimulus position. Commonly, robust congruency effects are found; that is, reactions are faster when the relevant stimulus attribute and the assigned response match the location of the stimulus. Simon congruency effects are regularly attributed to a fast, nonverbal processing route. In 3 experiments, we tested the importance of verbal representations in the Simon effect by manipulating the format of representations (verbal vs. nonverbal) with stimulus material (i.e., words vs. gratings) and stimulus arrangement (i.e., horizontally vs. vertically). Results of the first experiment point to a modulation of the Simon effects by both factors when they were manipulated between subjects, up to an inversion of the Simon effect for words presented in vertical arrangement. We replicated the inverse congruency effect for verbal material in vertical arrangement when a within-participant design was used (Experiment 2) and when the impact of reading processes was ruled out (Experiment 3). One cause for this inversion might be the construction of language-based representations that counteract automatic processing given the stimulus arrangement. To investigate this, we assessed individual differences in the use of inner speech for self-instruction. Using hierarchical linear modeling analysis, we found that self-rated evaluative and motivational inner speech processes accounted for a significant portion of the Simon effect. This supports claims that individual differences predict performance even in simple cognitive tasks such as the Simon task and highlights the flexibility of basic cognitive processes.