When trained on related or low-resource languages, multilingual speech recognition models often outperform their monolingual counterparts. However, these models can suffer from loss in performance for high resource or unrelated languages. We investigate the use of a mixture-of-experts approach to assign per-language parameters in the model to increase network capacity in a structured fashion. We introduce a novel variant of this approach, 'informed experts', which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in these task-specific parameters. We conduct experiments on a real-world task with English, French and four dialects of Arabic to show the effectiveness of our approach. Our model matches or outperforms the monolingual models for almost all languages, with gains of as much as 31% relative. Our model also outperforms the baseline multilingual model for all languages by up to 9% relative.Index Termsend-to-end speech recognition, multilingual, RNN-T, language id, mixture of experts * The first two authors have equal contribution. The rest of the list is sorted alphabetically.variations in amounts of training data. In this paper, we propose one multilingual model to transcribe languages with varied amounts of training data. We use the mixture-of-experts approach (MOE) [8] and adapt it to exploit the inherent structure of the data to simultaneously learn per-language experts.