One major goal for the development of virtual reality (VR) toolkits is to provide interfaces for novel input or output hardware to support multimodal interaction. The research community has produced several implementations that feature a large variety of device interfaces and abstractions. As a lesson learned from existing approaches, we sum up the requirements for the design of a driver layer that is the basis for a multimodal input and output system in this paper. We derive a general model for driver architectures based on these requirements. This model can be used for reasoning about different implementations of available architectures. As the flow of data through the system is of interest, we take a closer look at common patterns of data processing. Finally, we discuss a number of openly accessible driver architectures currently used for VR development.