Multimodal interaction is a proposal to turn the interaction between humans and machines more natural, increasing the usability, flexibility, and convenience of one application. Improve an application with multimodal features impacts on its architecture and, to describe the main components to treat the multimodality, some architecture models are proposed in the literature, including for Web multimodal systems. E-Learning environments are Web-based systems and need a good usability, flexibility, and convenience: requirements that can be improved with implementation of multimodal features on them. Since they have their own peculiarities, we need a more specific multimodal architecture model described in such a way to reuse the components built for multimodal systems and to connect them with the e-learning environment components. This chapter proposes an architecture for multimodal e-learning environments. A viability study was done in the Ae, an e-learning environment developed using a component-based development process, with components to treat the pen and touch modalities.