Visual crowding is a perceptual phenomenon with far-reaching implications in both perceptual (e.g., object recognition and reading) and clinical (e.g., developmental dyslexia and visual agnosia) domains. Here, we combined event-related fMRI measurements and wide-field brain mapping methods to investigate whether the BOLD response evoked by visual crowding is modulated by different attentional conditions. Participants underwent two sessions of psychophysical training outside the scanner, and then fMRI BOLD activity was measured simultaneously in early visual areas (including the visual word form area, VWFA), while they viewed strongly-crowded and weakly-crowded Gabor patches in attended and unattended conditions. We found that crowding increased BOLD activity in a network of areas including V1, V2, V3A, V4/V8, and VWFA. In V4/V8 and VWFA we found an increased activity related to attention. The effect of crowding in V1 was recorded only when attention was fully devoted to the target location. Our results provide evidence that some area beyond V1 might be the likely candidate for the site of crowding, thus supporting the view of visual crowding as a mid-level visual phenomenon.