In the recent years, we have witnessed the rapid development of face recognition, though it is still plagued by variations such as facial expressions, pose, and occlusion. In contrast to the face, the ear has a stable 3D structure and is nearly unaffected by aging and expression changes. Both the face and ear can be captured from a distance and in a nonintrusive manner, which makes them applicable to a wider range of application domains. Together with their physiological structure and location, the ear can readily serve as supplement to the face for biometric recognition. It has been a trend to combine the face and ear to develop nonintrusive multimodal recognition for improved accuracy, robustness, and security. However, when either the face or the ear suffers from data degeneration, if the fusion rule is fixed or with inferior flexibility, a multimodal system may perform worse than the unimodal system using only the modality with better quality sample. The biometric quality-based adaptive fusion is an avenue to address this issue. In this paper, we present an overview of the literature about multimodal biometrics using the face and ear. All the approaches are classified into categories according to their fusion levels. In the end, we pay particular attention to an adaptive multimodal identification system, which adopts a general biometric quality assessment (BQA) method and dynamically integrates the face and ear via sparse representation. Apart from a refinement of the BQA and fusion weights selection, we extend the experiments for a more thorough evaluation by using more datasets and more types of image degeneration.