Text emotion detection is a pivotal aspect of natural language processing, with wideranging applications involving human-computer interactions. Machine learning models have been trained with supervised methods, thus relying on labeled datasets. However, the arbitrary selection of emotion models while labeling such datasets poses significant challenges in the performance and generalizability of the produced machine learning models, primarily when evaluated against unseen data, as it effectively introduces bias to the process. This study investigates the impact of emotion model selection on the efficacy of machine learning models for text emotion detection. Eight labeled datasets were employed to train linear regression, feedforward neural network, and BERT-based deep learning models. Results demonstrated a notable decrease in accuracy when models trained on one dataset were tested on others, underscoring the inherent incompatibilities in labeling across datasets. To prove that the emotion model significantly impacts models' performance, we propose a standardized emotion label mapping utilizing James Russell's Circumplex Model of Affect, that turns the emotion model into a parameter rather than a fixed element. Cross-dataset testing with this shared emotion mapping yielded significant, non-negligible changes in accuracy (both improvement and degradation). This fact highlights the impact of the emotion model (traditionally arbitrarily selected) during machine learning training and performance, arguing that improvements in accuracy reported in related research literature might be due to differences in the used emotion model rather than the new algorithms introduced.