This study seeks to identify human emotions using artificial neural networks. Emotions are difficult to understand and hard to measure quantitatively. Emotions may be reflected in facial expressions and voice tone. Voice contains unique physical properties for every speaker. Everyone has different timbres, pitch, tempo, and rhythm. The geographical living area may affect how someone pronounces words and reveals certain emotions. The identification of human emotions is useful in the field of human-computer interaction. It helps develop the interface of software that is applicable in community service centers, banks, education, and others. This research proceeds in three stages, namely data collection, feature extraction, and classification. We obtain data in the form of audio files from the Berlin Emo-DB database. The files contain human voices that express five sets of emotions: angry, bored, happy, neutral, and sad. Feature extraction applies to all audio files using the method of Mel Frequency Cepstrum Coefficient (MFCC). The classification uses Multi-Layer Perceptron (MLP), which is one of the artificial neural network methods. The MLP classification proceeds in two stages, namely the training and the testing phase. MLP classification results in good emotion recognition. Classification using 100 hidden layer nodes gives an average accuracy of 72.80%, an average precision of 68.64%, an average recall of 69.40%, and an average F1-score of 67.44%.