Emotion researchers that use experience sampling methods (ESM) study how emotions fluctuate in everyday life. To reach valid conclusions, confirming the reliability of momentary emotion measurements is essential. However, to minimize participant burden, ESM researchers often use single-item measures, preventing a reliability assessment of people’s emotion ratings. Furthermore, because emotions constantly change, checking reliability via conventional test–retest procedures is impractical, for it is impossible to separate measurement error from meaningful emotional variability. Here, drawing from classical test theory (CTT), we propose two time-varying test–retest adaptations to evaluate the reliability of single-item (emotion) measures in ESM. Following Method 1, we randomly repeat one emotion item within the same momentary survey and evaluate the discrepancy between test and retest ratings to determine reliability. Following Method 2, we introduce a subsequent, shortly delayed retest survey and extrapolate the size of test–retest discrepancies to the hypothetical instance where no time between assessments would exist. First, in an analytical study, we establish the mathematical relation between observed test–retest discrepancies and measurement error variance for both methods, based on common assumptions in the CTT literature. Second, in two empirical studies, we apply both methods to real-life emotion time series and find that the size of error in people’s emotion ratings corresponds with almost a 10th of the scale, comprising around 27% of the total variability in participants’ affective responses. Consequently, disregarding measurement error in ESM is problematic, and we encourage researchers to include a test–retest procedure in their future studies when relying on single-item measures.