The review gives an introductory description of the successive development of data patterns based on comparisons between hearing-impaired and normal hearing participants’ speech understanding skills, later prompting the formulation of the Ease of Language Understanding (ELU) model. The model builds on the interaction between an input buffer (RAMBPHO, Rapid Automatic Multimodal Binding of PHOnology) and three memory systems: working memory (WM), semantic long-term memory (SLTM), and episodic long-term memory (ELTM). RAMBPHO input may either match or mismatch multimodal SLTM representations. Given a match, lexical access is accomplished rapidly and implicitly within approximately 100–400 ms. Given a mismatch, the prediction is that WM is engaged explicitly to repair the meaning of the input – in interaction with SLTM and ELTM – taking seconds rather than milliseconds. The multimodal and multilevel nature of representations held in WM and LTM are at the center of the review, being integral parts of the prediction and postdiction components of language understanding. Finally, some hypotheses based on a selective use-disuse of memory systems mechanism are described in relation to mild cognitive impairment and dementia. Alternative speech perception and WM models are evaluated, and recent developments and generalisations, ELU model tests, and boundaries are discussed.