“…The segregation of a complex auditory mixture is thought to involve a multistage hierarchy of processing, whereby initial pre-attentive processes that partition the sound waveform into distinct acoustic features (e.g., pitch, harmonicity) are followed by later, post-perceptual principles (Koffka, 1935) (e.g., grouping by physical similarity, temporal proximity, good continuity (Bregman, 1990) and phonetic template matching (Alain et al, 2005a; Meddis and Hewitt, 1992). Psychophysical research from the past several decades confirms that human listeners exploit fundamental frequency (F0) differences (i.e., pitch) to segregate concurrent speech (Arehart et al, 1997; Assmann and Summerfield, 1989; Assmann and Summerfield, 1990; Assmann and Summerfield, 1994; Chintanpalli et al, 2016; de Cheveigne et al, 1997). For example, when two steady-state (time-invariant) synthetic vowels are presented simultaneously to the same ear, listeners’ identification accuracy increases when a difference of four semitones(STs) is introduced between vowel F0s (Assmann and Summerfield, 1989; Assmann and Summerfield, 1990; Assmann and Summerfield, 1994; Culling, 1990; McKeown, 1992; Scheffers, 1983; Zwicker, 1984).…”