Studies have consistently demonstrated that both overinvestment and underinvestment exert adverse effects on the overall efficacy of business operations, showcasing the significance of understanding and addressing these phenomena in the realm of scholarly research. Therefore, in this study, we aim to develop an accurate machine-learning model to identify overinvestment in firms listed on the HSX and the HNX stock exchanges in Vietnam. We decided to conduct a comparison to identify the optimal model for classifying firms of overinvestment or not, including Logistic Regression, K-Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree, and Random Forest. Using a sample of 658 non-financial listed companies in Vietnam between 2011 and 2021, our result shows that the most importance predictor variable is "FCF" (free cash flow), with an importance value of 0.14. Although both logistic regression and random forest (RD) algorithms demonstrate high accuracy in identifying firms with overinvestment, the Random Forest algorithm exhibits slightly higher precision and recall for class 1 (overinvestment firms) when compared to Logistic Regression. By contrast, the accuracy performance of the four models (NB, KNN, DT, and SVM) is low, ranging from 0.53 to 0.67. At the microeconomic level, this research can help businesses gain insights into their financial performance, identify areas for improvement, and take proactive measures to avoid financial distress and improve profitability by identifying potential cases of overinvestment. Overall, the study provides a valuable contribution to the field of financial analysis using machine learning techniques. We firmly believe that the findings of this research will serve as a significant scholarly reference for future investigations in the field and explore other importance predictors of overinvestment in Vietnam and other emerging markets.