INTRODUCTION Millions of patients attend emergency departments (EDs) around the world every year. Patients are triaged on arrival by a trained nurse who collects structured data and an unstructured free-text history of presenting complaint. Natural language processing (NLP) uses various computational methods to analyse and understand human language, and has been applied to data acquired at ED triage to predict various outcomes. The objective of this systematic review is to evaluate how NLP has been applied to ED triage, assess if NLP based models outperform humans or current risk stratification techniques, and assess if incorporating free-text improve predictive performance of models when compared to predictive models that use only structured data. METHODS All English language peer-reviewed research that applied an NLP technique to free-text obtained at ED triage was eligible for inclusion. We excluded studies focusing solely on disease surveillance, and studies that used information obtained after triage. We searched the electronic databases MEDLINE, Embase, Cochrane Database of Systematic Reviews, Web of Science, and Scopus for medical subject headings and text keywords related to NLP and triage. Databases were last searched on 01/01/2022. Risk of bias in studies was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Due to the high level of heterogeneity between studies, a metanalysis was not conducted. Instead, a narrative synthesis is provided. RESULTS In total, 3584 studies were screened, and 19 studies were included. The population size varied greatly between studies ranging from 1.8 million patients to 762 simulated encounters. The most common primary outcomes assessed were prediction of triage score, prediction of admission, and prediction of critical illness. NLP models achieved high accuracy in predicting need for admission, critical illness, and mapping free-text chief complaints to structured fields. Overall, NLP models predicted admission with greater accuracy than emergency physicians, outperformed abnormal vital sign trigger and triage score at predicting critical illness, and were more accurate than nurses at assigning triage scores in two out of three papers. Incorporating both structured data and free-text data improved results when compared to models that used only structured data. The majority of studies were (79%) were assessed to have a high risk of bias, and only one study reported the deployment of an NLP model into clinical practice. CONCLUSION Unstructured free-text triage notes contain valuable information that can be used by NLP models to predict clinically relevant outcomes. The use of NLP at ED triage appears feasible and could allow for early and accurate prediction of multiple important patient-oriented outcomes. However, there are few examples of implementation of into clinical practice, most research in retrospective, and the potential benefits of NLP at triage are yet to be realised.