Background: Verbal Autopsy (VA) is a tool commonly used in low to medium income countries to ascertain cause of death, where most deaths are not assigned a medically certified cause. As such, they strengthen health priorities, inform policy, practice and provide vital information where civil registration systems are weak. Physician diagnosis is used as a gold standard to determine cause of death, from VA interviews even though it is inconsistent and expensive. Alternatively, conventional computer algorithms and machine learning approaches have been applied.However, they fail to perform optimally because of data quality and ineffective strategies that they employ. We present a robust machine learning framework that can accurately classify cause of death using only narratives from VA interviews. Methods: Experiments started with data acquisition of the VA narratives, followed by data preprocessing. We created numeric vectors to represent the narratives using various feature engineering techniques for twelve cause of death categories. Furthermore, we applied data balancing, feature scaling, hyper-parameter tuning and dimensionality reduction in order to improve model performance. We applied eight different classification approaches to the vectors to generate model predictors of cause of death. Validation was done using Precision, Recall, Accuracy, F1-score and Receiver Operating Characteristic Area Under Curve (ROCAUC). Results: We used the physician diagnosis as our gold standard for validation of our models. Our five best classifiers attained a Precision, Recall, Accuracy and F1-score of 95%, 94%, 93%, 92% and 91% respectively in cause of death classification of all twelve disease categories. We report on Micro-Average ROCAUC of 96% and Macro-Average ROCAUC of 95% of our twelve classes. Conclusion: Our proposed robust machine learning framework can be a faster and cost effective way to determine cause of death from rich informative unstructured VA narratives. This study can also serve as a benchmark of model comparability and generalisation of machine learning models in determining cause of death using VA data. Our study was limited in terms of data quality. Future work aims at using combined responses and narratives for our models and also applying deep learning architectures for cause of death classification using VAs.