Precision, Recall, and F1-score are metrics that are often used to evaluate model performance. Precision and Recall are very important to consider when the data is balanced, but in the case of unbalanced data the F1-score is the most important metric. To find out the importance of these metrics, a comparative analysis is needed in order to determine which metric is appropriate for the data being analyzed. This study aims to perform a comparative analysis of various evaluation metrics on unbalanced data in multi-class text classification. This study uses an unbalanced multi-class text dataset including: association, negative, cause of disease, and treatment of disease. This study involves five classifiers as algorithm-level approach, namely: Multinomial Naive Bayes, K-Nearest Neigbors, Support Vector Machine, Random Forest, and Long Short-Term Memory. Meanwhile, data-level approach, this study involves under sampling, over sampling, and synthetic minority oversampling technique. Several evaluation metrics used to evaluate model performance include Precision, Recall, and F1-score. The results show that the most suitable evaluation metric for use on unbalanced data depends on the purpose of use and the desired priority, including the classifier that is suitable for handling multi-class assignments on unbalanced data. The results of this study can assist practitioners in selecting evaluation metrics that are in accordance with the goals and application needs of multi-class text classification.