The approach-avoidance task (AAT) is an implicit task that measures people’s behavioral tendencies to approach or avoid stimuli in the environment. In recent years, it has been used successfully to help explain a variety of health problems (e.g., addictions and phobias). Unfortunately, more recent AAT studies have failed to replicate earlier promising findings. One explanation for these replication failures could be that the AAT does not reliably measure approach-avoidance tendencies. Here, we first review existing literature on the reliability of various versions of the AAT. Next, we examine the AAT’s reliability in a large and diverse sample (N = 1077; 248 of whom completed all sessions). Using a smartphone-based, mobile AAT, we measured participants’ approach-avoidance tendencies eight times over a period of seven months (one measurement per month) in two distinct stimulus sets (happy/sad expressions and disgusting/neutral stimuli). The mobile AAT’s split-half reliability was adequate for face stimuli (r = .85), but low for disgust stimuli (r = .72). Its test–retest reliability based on a single measurement was poor for either stimulus set (all ICC1s < .3). Its test–retest reliability based on the average of all eight measurements was moderately good for face stimuli (ICCk = .73), but low for disgust stimuli (ICCk = .5). Results suggest that single-measurement AATs could be influenced by unexplained temporal fluctuations of approach-avoidance tendencies. These fluctuations could be examined in future studies. Until then, this work suggests that future research using the AAT should rely on multiple rather than single measurements.