With the growing awareness of data privacy, federated learning (FL) has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind, which allows building models in a collaborative but private way without exchanging data. However, most FL clients are currently unimodal. With the rise of edge computing, various types of sensors and wearable devices generate a large amount of data from different modalities, which has inspired research efforts in multimodal federated learning (MMFL). In this survey, we explore the area of MMFL to address the fundamental challenges of FL on multimodal data. First, we analyse the key motivations for MMFL. Second, the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL. Then, we discuss the datasets and application scenarios of MMFL. Finally, we highlight the limitations and challenges of MMFL and provide insights and methods for future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.