The influence of ionic liquids (ILs) characteristics, lignocellulosic biomass (LCB) properties, and process conditions on LCB pretreatment is not well understood. In this study, a total of 129 experimental data on cellulose, hemicellulose, lignin, and solid recovery from IL-based LCB pretreatment were compiled from literature to develop machine learning models. Following data imputation, bilayer artificial neural network (ANN) and random forest (RF) regression were developed to model the dataset. The full-featured ANN following Bayesian hyperparameter (HP) optimization though offered excellent fit on training (R2:0.936–0.994), cross-validation (R2CV) performance remained marginally poor, i.e., between 0.547 and 0.761. The fitness of HP-optimized RF models varied between 0.824–0.939 for regression, and between 0.383–0.831 in cross-validation. Temperature and pretreatment time had been the most important predictors, except for hemicellulose recovery. Bayesian predictor selection combined with HPO improved the R2CV boundary for ANN (0.555–0.825), as well as for RF models (0.474–0.824). As the predictive performance of the models varied depending on the target response, the use of a larger homogeneous dataset may be warranted.