Breath acetone concentrations were found to be correlated with blood ketone levels. Based on this evidence, predicting blood ketone levels using breath analysis and machine learning (ML) becomes possible. Nevertheless, a good ML model requires a large amount of training data. Under certain conditions, it is difficult to collect large amounts of data such as during the Covid-19 pandemic. To overcome this problem, we propose an augmentation technique to extend the number of training datasets using two step synthetic minority oversampling (SMOTE). The first step was to increase the amount of training data by combining it with synthetic data, while the second step was to balance the data at each ketone level. The strategy for using SMOTE with regression was further explained since this study aims to predict ketone levels with numerical output values and SMOTE is typically used in classification cases. The proposed method was evaluated by entering the data into several ML methods such as deep neural network regression (DNN-R), linear regression (ML-R), ransac regression (RC-R), K-nearest neighbour regression (KNN-R), decision tree regression (DT-R), random forest regression (RF-R), Ada boost regression (AD-R), Gradient boost regression (GB-R) and XG-boost regression (XGB-R). Based on the test results, when compared without the proposed method, an increase in accuracy was obtained on