Both neural activity and behavior of highly trained animals are strikingly variable across repetition of behavioral trials. The neural variability consistently decreases during behavioral tasks, in both sensory and motor cortices. The behavioral variability, on the other hand, changes depending on the difficulty of the task and animal performance. Here we study a mechanism for such variability in spiking neural network models with cluster topologies that enable multistability and attractor dynamics, features subserving functional roles such as decision-making, (working) memory and learning. Multistable attractors have been studied in spiking neural networks through clusters of strongly interconnected excitatory neurons. However, we show that this network topology results in the loss of excitation/inhibition balance and does not confer robustness against modulation of network activity. Moreover, it leads to widely separated firing rate states of single neurons, inconsistent with experimental observations. To overcome these problems we propose that a combination of excitatory and inhibitory clustering restores local excitation/inhibition balance. This network architecture is inspired by recent anatomical and physiological studies which point to increased local inhibitory connectivity and possible inhibitory clustering through connection strengths. We find that inhibitory clustering supports realistic spiking activity in terms of a biologically realistic firing rate, spiking irregularity, and trial-to-trial spike count variability. Furthermore, with the appropriate stimulation of network clusters, this network topology enabled us to qualitatively and quantitatively reproduce in vivo firing rate, variability dynamics and behavioral reaction times for different task conditions as observed in recordings from the motor cortex of behaving monkeys.