Can human listeners use implicit temporal contingencies in auditory input to form temporal predictions, and if so, how are these predictions represented endogenously? To assess this question, we manipulated foreperiods in an auditory pitch discrimination task: unbeknownst to participants, the pitch of the standard tone could either be deterministically predictive of the temporal onset of the target tone, or convey no predictive information. Predictive and non-predictive conditions were presented interleaved in one stream, and separated by variable inter-stimulus intervals such that there was no dominant stimulus rhythm throughout. Even though participants were unaware of the implicit temporal contingencies, pitch discrimination sensitivity (the slope of the psychometric function) increased when the onset of the target tone was predictable in time (N = 49, 28 female, 21 male). Concurrently recorded EEG data (N = 24) revealed that standard tones that conveyed temporal predictions evoked a more negative N1 component than non-predictive standards. We observed no significant differences in oscillatory power or phase coherence between conditions during the foreperiod. Importantly, the phase angle of delta oscillations (1-3 Hz) in auditory areas in the post-standard and pre-target time window predicted behavioral pitch discrimination sensitivity. This suggests that temporal predictions can be initiated by an optimized delta phase reset and are encoded in delta oscillatory phase during the foreperiod interval. In sum, we show that auditory perception benefits from implicit temporal contingencies, and provide evidence for a role of slow neural phase in the endogenous representation of temporal predictions, in absence of exogenously driven entrainment to rhythmic input. Acknowledgments: This research was supported by a DFG grant (HE 7520/1-1) to SKH.Auditory environments come with an inherent temporal structure, which human listeners can use to predict the timing of future inputs. Yet, how these regularities in sensory inputs are transformed into temporal predictions is not known. Here, we implicitly induced temporal predictability in the absence of a rhythmic input structure, to avoid exogenously driven entrainment of neural oscillations. Our results show that even implicit and non-rhythmic temporal predictions are extracted and used by human listeners, underlining the role of timing for auditory processing. Furthermore, our EEG results point towards an instrumental role of delta oscillations in initiating temporal predictions, possibly by an optimized phase reset in response to a temporally predictive cue.