The increasing complexity of sustainable development amid financial market regulations has increased the importance of high-quality datasets. However, there is a lack of an integrated approach combining green-finance metrics with the requisite data optimization. This study presents an integrated approach combining green-finance metrics with data optimization. The study uses factorial design methodologies on a sample of 30 firms listed on the Saudi Stock Exchange. Data over five years (2018–2022) were analyzed, focusing on key financial metrics, ESG (environmental, social, and governmental) scores, and sustainability factors. Data analysis used machine-learning models including random forest and XGBoost, Principal Component Analysis (PCA), and regression techniques to evaluate prediction accuracy. The findings revealed that extending the data history from 1–2 to 3–5 years reduced the mean squared error (MSE) by up to 40%, with the XGBoost model achieving an MSE of 0.03 and demonstrating better generalization. In contrast, random forest showed a near-perfect fit with an MSE of 0.00 but risked overfitting. The sampling frequency also affected the accuracy, with weekly and monthly sampling outperforming daily intervals, resulting in an MSE improvement of 15–20%. This study provides a framework for integrating ESG metrics into economic models, aiding policymakers and industry leaders in making informed decisions. The promising results of this study also open avenues for future research and development in sustainable finance and data analysis, offering hope for further progress and innovation.