Large language models have made great advances in the past years, creating compelling responses to extended verbal inputs. After the release of ChatGPT 3.5, researchers have identified several opportunities and challenges of large language models in various fields of education. However, at that point, it was unforeseeable how fast a multitude of technological advances would erupt and how dynamically educational research would change, undoubtedly facing and increasing number of challenges related to large language models. Now, large language models can be used as a middleware connecting various AI tools and other large language models to solve complex tasks. This led to the development of so-called large multimodal foundation models, such as ChatGPT-4-Turbo and Gemini, that do not only interact via written text with the user, but have the power to process spoken text, music, images and videos. These models open up vast new territories of opportunities and come with unexpected challenges. In this overview, we outline and explain the new set of opportunities and challenges in education that arise from large multimodal foundation models for learners, teachers, educational researchers and developers of educational tools additionally to the opportunities and challenges of conventional large language models.