With increasingly higher levels of automation in aerospace decision support systems, it is imperative that the human operator maintains a high level of situational awareness in different operational conditions and a central role in the decision-making process. While current aerospace systems and interfaces are limited in their adaptability, a Cognitive Human Machine System (CHMS) aims to perform dynamic, real-time system adaptation by estimating the cognitive states of the human operator. Nevertheless, to reliably drive system adaptation of current and emerging aerospace systems, there is a need to accurately and repeatably estimate cognitive states, particularly for Mental Workload (MWL), in real-time. As part of this study, two sessions were performed during a Multi-Attribute Task Battery (MATB) scenario, including a session for offline calibration and validation and a session for online validation of eleven multimodal inference models of MWL. The multimodal inference model implemented included an Adaptive Neuro Fuzzy Inference System (ANFIS), which was used in different configurations to fuse data from an Electroencephalogram (EEG) model’s output, four eye activity features and a control input feature. The results from the online validation of the ANFIS models demonstrated that five of the ANFIS models (containing different feature combinations of eye activity and control input features) all demonstrated good results, while the best performing model (containing all four eye activity features and the control input feature) showed an average Mean Absolute Error (MAE) = 0.67 ± 0.18 and Correlation Coefficient (CC) = 0.71 ± 0.15. The remaining six ANFIS models included data from the EEG model’s output, which had an offset discrepancy. This resulted in an equivalent offset for the online multimodal fusion. Nonetheless, the efficacy of these ANFIS models could be seen with the pairwise correlation with the task level, where one model demonstrated a CC = 0.77 ± 0.06, which was the highest among all the ANFIS models tested. Hence, this study demonstrates the ability for online multimodal fusion from features extracted from EEG signals, eye activity and control inputs to produce an accurate and repeatable inference of MWL.