Human listeners are remarkably adept at perceiving speech and other sounds in unfavorable acoustic environments. Typically, the sound source of interest is contaminated by other acoustic sources, and listeners are therefore faced with the problem of unscrambling the mixture of sounds that arrives at their ears. Nonetheless, human listeners can segregate one voice from a mixture of many voices at a cocktail party, or follow a single melodic line in a performance of orchestral music. Much as the visual system must combine information about edges, colors and textures in order to identify perceptual wholes (e.g., a face or a table), so the auditory system must solve an analogous auditory scene analysis (ASA) problem in order to recover a perceptual description of a single sound source (Bregman 1990). Understanding how the ASA problem is solved at the physiological level is one of the greatest challenges of hearing science, and is one that lies at the core of the "systems" approach of this book.The emerging field of computational auditory scene analysis (CASA) aims to develop machine systems that mimic the ability of human listeners to perceptually segregate acoustic mixtures (Wang and Brown 2006). However, most CASA systems are motivated by engineering applications (e.g., robust automatic speech recognition in noise), and take inspiration from psychophysical and physiological accounts of human ASA without slavishly adhering to them. The review in this chapter is therefore necessarily selective, and concerns only those computational models of ASA that are based on physiologically plausible mechanisms.The next section briefly reviews the psychology of ASA, and then Sect. 8.3 considers the likely physiological basis of ASA in general terms, and its relationship with the wider issue of feature binding in the brain. Computer models of specific ASA