The early detection of large-vessel occlusion (LVO) strokes is increasingly important as these patients are potential candidates for endovascular therapy, the availability of which is limited. Prehospital LVO detection scales mainly contain symptom variables only; however, recent studies revealed that other types of variables could be useful as well. Our aim was to comprehensively assess the predictive ability of several clinical variables for LVO prediction and to develop an optimal combination of them using machine learning tools. We have retrospectively analysed data from a prospectively collected multi-centre stroke registry. Data on 41 variables were collected and divided into four groups (baseline vital parameters/demographic data, medical history, laboratory values, and symptoms). Following the univariate analysis, the LASSO method was used for feature selection to select an optimal combination of variables, and various machine learning methods (random forest (RF), logistic regression (LR), elastic net method (ENM), and simple neural network (SNN)) were applied to optimize the performance of the model. A total of 526 patients were included. Several neurological symptoms were more common and more severe in the group of LVO patients. Atrial fibrillation (AF) was more common, and serum white blood cell (WBC) counts were higher in the LVO group, while systolic blood pressure (SBP) was lower among LVO patients. Using the LASSO method, nine variables were selected for modelling (six symptom variables, AF, chronic heart failure, and WBC count). When applying machine learning methods and 10-fold cross validation using the selected variables, all models proved to have an AUC between 0.736 (RF) and 0.775 (LR), similar to the performance of National Institutes of Health Stroke Scale (AUC: 0.790). Our study highlights that, although certain neurological symptoms have the best ability to predict an LVO, other variables (such as AF and CHF in medical history and white blood cell counts) should also be included in multivariate models to optimize their efficiency.