Automatic speaker verification (ASV) technology now reports a reasonable level of accuracy in its applications in voice-based biometric systems. However, it requires adequate amount of speech data for enrolment and verification; otherwise, the performance becomes considerably degraded. For this reason, the trade-off between the convenience and security is difficult to maintain in practical scenarios. The utterance duration remains a critical issue while deploying a voice biometric system in real-world applications. A large amount of research work has been carried out to address the limited data issue within the scope of SV. The advancements and research activities in mitigating the challenges due to short utterance have seen a significant rise in recent times. In this study, the authors present an extensive survey of SV with short utterances considering the studies from recent past and include latest research offering various solutions and analyses. The review also summarises the major findings of the studies of duration variability problem in ASV systems. Finally, they discuss a number of possible future directions promoting further research in this field. 2 Brief overview of ASV An ASV system includes three fundamental modules [1, 2]: a feature extraction unit, which transforms the speech signal in a compact form, a statistical modelling unit to characterise the extracted features, and finally a classification module to classify a test speech. 2.1 Feature extraction approaches The state-of-the-art ASV systems use three major types of feature extraction techniques: sub-segmental, segmental and suprasegmental analyses. Speech signals analysed using the frame size