According to the dominant account of face processing, recognition of emotional expressions is implemented by the superior temporal sulcus (STS), while recognition of face identity is implemented by inferior temporal cortex (IT) (Haxby et al., 2000). However, recent patient and imaging studies (Fox et al., 2011, Anzellotti et al. 2017) found that the STS also encodes information about identity.Jointly representing expression and identity might be computationally advantageous: learning to recognize expressions could lead to the emergence of representations that support identity recognition. To test this hypothesis, we trained a deep densely connected convolutional network (DenseNet, Huang et al., 2017) to classify face images from the fer2013 dataset as either angry, disgusted, afraid, happy, sad, surprised, or neutral. We then froze the weights of the DenseNet and trained linear layers attached to progressively deeper layers of this net to classify either emotion or identity using a subset of the Karolinska (KDEF) dataset. Finally, we tested emotion and identity classification in left out images in the KDEF dataset that were not used for training.Classification accuracy for emotions in the KDEF dataset increased from early to late layers of the DenseNet, indicating successful transfer across datasets. Critically, classification accuracy for identity also increased from early to late layers of this DenseNet, despite the fact that it had not been trained to classify identity. A linear layer trained on the DenseNet features vastly outperformed a linear layer trained on pixels (98.8% vs 68.7%), demonstrating that the high accuracy obtained with the DenseNet features cannot be explained by low-level confounds. These results show that learning to recognize facial expressions can lead to the spontaneous emergence of representations that support the recognition of identity, thus offering a principled computational account for the discovery of expression and identity representations within the same portion of STS.