In oil and gas drilling, timely and accurate identification of formation lithology is an important factor in drilling safety. In response to the problems of inaccuracy and low efficiency of complex lithology identification by traditional methods such as elemental crossplot in drilling and logging, the Categorical Boost (CatBoost) model is applied to lithology identification in this study. However, since CatBoost uses more hyperparameters in its modeling, it is difficult to optimize model prediction by manually tuning the parameters. Therefore, the introduction of Kernel Principal Component Analysis (KPCA) extracts fewer and more important features from the original data, eliminates the redundant information contained therein, and combines with Bayesian Optimization (BO) algorithm to optimize the hyperparameters during the training process, thus improving the prediction performance of CatBoost. Two experiments were designed to verify the recognition ability of the model, and the final test results of the model showed that the KPCA-BO-CatBoost model proposed in this study had the best comprehensive performance, and the lithology recognition accuracy reached over 90%. The model is effective in identifying formation lithology, improving the efficiency and accuracy of lithology identification and providing important guidance for subsequent drilling operations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.