In order to have an accurate and fast prediction of the artificial intelligence (AI) model, the choice of input features is at least as important as the choice of model. The effect of input features selection on the emission models of light diesel vehicles driven on real roads was investigated in this paper. The gradient boosting regression (GBR) model was used to train and to predict the emissions of nitrogen oxide (NOx), carbon dioxide (CO2), and the fuel consumption of real driving diesel vehicles in urban scenarios, the suburbs, and on highways. A portable emissions measurement system (PEMS) system was used to collect data of vehicles as well as environmental conditions. The vehicle was run on two routes. The model was trained with the first route data and was used to predict the emissions of the second route. There were ten features related to the NOx model and nine features associated with the CO2 model. The importance of each feature was sorted, and a different number of features were used as input to train the models. The best NOx model had the coefficient of determination (R2) values of 0.99, 0.99, and 0.99 in each driving pattern (urban, suburbs, and highways). Predictions of the second route had the R2 values of 0.88, 0.89, and 0.96 respectively. The best CO2 model had the R2 values of 0.98, 0.99, and 0.99 in each driving pattern, respectively. Predictions of the second route had the R2 values are 0.79, 0.82, and 0.83, respectively. The most important features for the NOx model are mass air flow rate (g/s), exhaust flow rate (m3/min), and CO2 (ppm), while the important features for the CO2 model are exhaust flow rate (m3/min) and mass air flow rate (g/s). It is noted that the regression models based on the top three features may give predictions very close to the measured data.