Background: Over the past 30 years, the emotional dot probe task has been a widely used task for measuring attentional bias to threat as a risk factor for psychopathology. Recent work suggests, however, that the emotional dot probe task has very low reliability, indicating inconsistent attentional biases across tests and trials. The current study aimed to gain a more comprehensive understanding of the reliability of the emotional dot probe task across many test versions to identify whether any parameters might yield reliable scores.Methods: Thirty-six versions of the emotional dot probe task were implemented with variations in the type of threat and neutral stimuli (faces, scenes, snakes/spiders), timing parameters (stimulus onset asynchrony of 100, 500, or 900 milliseconds), stimulus orientation (horizontal, vertical), and comparison condition (threat incongruent cue, all neutral cues). In Study 1, 7794 participants recruited through TestMyBrain.org were randomized to one of the 36 versions. Split-half reliability was estimated for each test version. In Study 2, 1839 participants completed one of the 8 versions of the emotional dot probe task with highest estimated reliability from Study 1, along with measures of anxiety (GAD-7 and Brief Hypervigilance Scale). Split-half reliability of threat facilitation scores was estimated for each test version, in all participants and in highly anxious participants only. Results: Split-half reliabilities of threat facilitation scores were low (often indistinguishable from zero) across all versions. Only one version with nonzero reliability from Study 1 showed nonzero reliability in Study 2 (ρ = 0.23, p < 0.05). Reliability was similarly nonsignificant in anxious individuals, except in the top decile of hypervigilance scores (but not the top quartile or in high scorers on GAD-7) where the estimate of reliability estimated using parametric tests was low (ρ = 0.29) but nominally significant based on an uncorrected threshold of p < 0.05. However, the split-half reliability of the top decile was no longer significant when we used nonparametric correlations (ρ = 0.06, p = .67). Conclusions: The emotional dot probe task is not an adequately reliable measure of individual differences in attentional bias to threat. We identified no parameters that produced attentional bias scores that had adequate reliability to justify inclusion in research studies that seek to quantify differences between people, including individuals with elevated anxiety.