Perception of facial expression is crucial in the social life of primates. This visual information is processed along the ventral cortical pathway and the subcortical pathway. Processing of face information in the subcortical pathway is inaccurate, but the architectural and physiological properties that are responsible remain unclear. We analyzed the performance of convolutional neural networks incorporating three prominent properties of this pathway: a shallow layer architecture, concentric receptive fields at the first processing stage, and a greater degree of spatial pooling. The neural networks designed in this way could be trained to classify seven facial expressions with a correct rate of 51% (chance level, 14%). This modest performance was gradually improved by replacing the three properties, one-by-one, two at a time, or all three simultaneously, with the corresponding features in the cortical pathway. Some processing units in the final layer were sensitive to spatial frequencies (SFs) in the retina-based coordinate, whereas others were sensitive to object-based SFs, similar to neurons in the amygdala. Replacement of any one of these properties affected the SF coordinate of units. All three properties constrain the accuracy of facial expression information in the subcortical pathway, and are essential for determining the coordinate of SF representation.